Flask Api for LLM inference. How to make this system concurrent. #1592

rites1095 · 2024-07-12T09:24:40Z

rites1095
Jul 12, 2024

Hi below is code that is using flask api for inference llama3 quantized model. It is working fine but when the new request comes while the old request is in process below system fails. For every request it takes around 1 to 2 minutes on RTX 3070 GPU. Below is the code

from flask import Flask, request, jsonify
from llama_cpp import Llama
import timeit
from PyPDF2 import PdfReader
import os
import json
import pdfplumber

def extract_text_from_pdf_pdfplumber(pdf_path):
text = ''
with pdfplumber.open(pdf_path) as pdf:
for page in pdf.pages:
print(page)
text += page.extract_text()
break
return text
app = Flask(name)

Path to your Llama model

model_path = r"models\Meta-Llama-3-8B-Instruct-Q6_K.gguf"

Initialize Llama model once during startup

llm = Llama(model_path=model_path, n_ctx=8192, n_gpu_layers=-1)

@app.route('/bank_pdf_to_json', methods=['POST'])
def extract_info():
if 'file' not in request.files:
return jsonify({"error": "No file part"}), 400

file = request.files['file']
if file.filename == '':
    return jsonify({"error": "No selected file"}), 400

if file:
    file_path = os.path.join("uploads", file.filename)
    file.save(file_path)

    # reader = PdfReader(file_path)
    # number_of_pages = len(reader.pages)
    # page = reader.pages[0]
    text = extract_text_from_pdf_pdfplumber(file_path)
    text1 = text.replace('-',' ')
    # text1 = text.splitlines()
    # joined_text = " ".join(text1)
    print(text)

    prompt1 = f'''You are given a text from bank statement pdf as Input text below. 
         Input: {text1}'

            your task is to extract the following details from this text :
            Details are to be extarcted : Transaction Amount, Current Balance, Narration, Transaction Type, Cheque Ref Number, transactionAccountName, Remark, Value Date, Closing Balance, Transaction Date, Suggested, Select and To and then final output will must be in json
            format, where details like Transaction Amount, Current Balance, Narration, Transaction Type, Cheque Ref Number, transactionAccountName, Remark, Value Date, Closing Balance, Transaction Date,Suggested, Select and To will be key and 
            their corrosponding data will be value. Remember transaction amount can not negative.

            you can take the review of output that will looks like below desired format
            Desired format:
            Data: {{"Data":["Transaction Amount":null,"Current Balance":null,"Narration":null,"Transaction Type":null,"Cheque Ref No.":null, "transactionAccountName":null,"Remark":null,"Value Date":null,"Closing Balance":null,"Transaction Date":null,"Suggested":null, "Select":null, "To":null]}}


            you are restricted to give the response in json format only...
            '''

    start = timeit.default_timer()

    x = llm.create_chat_completion(
        messages=[
            {
                "role": "system",
                "content": "You are a helpful assistant that outputs in JSON.",
            },
            {"role": "user", "content": prompt1},
        ],
        response_format={"type": "json_object"},
        temperature=0.0,
    )
    try:
        result_json = json.loads(x['choices'][0]['message']['content'])
    except:
        os.remove(file_path)
        return "Json generation stops abruptly. Reduce number of words"
    end = timeit.default_timer()
    output_file_path = os.path.join("output", f"{os.path.splitext(file.filename)[0]}_bank.json")
    print(output_file_path)
    with open(output_file_path, 'w') as json_file:
        json.dump(result_json, json_file, indent=4)
    os.remove(file_path)  # Clean up uploaded file

    return jsonify(result_json)
    
    how to make above code real time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flask Api for LLM inference. How to make this system concurrent. #1592

{{title}}

Replies: 0 comments

Select a reply

Flask Api for LLM inference. How to make this system concurrent. #1592

rites1095 Jul 12, 2024

Path to your Llama model

Initialize Llama model once during startup

Replies: 0 comments

rites1095
Jul 12, 2024