Skip to content

ScrapeGraphAI/scrapebiblio

Repository files navigation

ScrapeBiblio: PDF Reference Extraction and Verification Library

Powered by Scrapegraphai

ScrapeBiblio Logo Downloads

ScrapeBiblio is a powerful library designed to extract references from PDF files, verify them against various databases, and convert the content to Markdown format.

News 📰

  • ScrapegraphAI has now his APIs! Check it out here!

Features

  • Extract text from PDF files
  • Extract references using OpenAI's GPT models
  • Verify references using Semantic Scholar, CORE, and BASE databases
  • Convert PDF content to Markdown format
  • Integration with ScrapeGraph for additional reference checking

Installation

Install ScrapeBiblio using pip:

pip install scrapebiblio

Configuration

Create a .env file in your project root with the following content:

OPENAI_API_KEY=your_openai_api_key
SEMANTIC_SCHOLAR_API_KEY=your_semantic_scholar_api_key
CORE_API_KEY=your_core_api_key
BASE_API_KEY=your_base_api_key

Usage

Here's a basic example of how to use ScrapeBiblio:

from scrapebiblio.core.find_reference import process_pdf
from dotenv import load_dotenv
import os
load_dotenv()
pdf_path = 'path/to/your/pdf/file.pdf'
output_path = 'references.md'
openai_api_key = os.getenv('OPENAI_API_KEY')
semantic_scholar_api_key = os.getenv('SEMANTIC_SCHOLAR_API_KEY')
core_api_key = os.getenv('CORE_API_KEY')
base_api_key = os.getenv('BASE_API_KEY')
process_pdf(pdf_path, output_path, openai_api_key, semantic_scholar_api_key,
core_api_key=core_api_key, base_api_key=base_api_key)

Advanced Usage

ScrapeBiblio offers additional functionalities:

  1. Convert PDF to Markdown:
from scrapebiblio.core.convert_to_md import convert_to_md
convert_to_md(pdf_path, output_path, openai_api_key)
  1. Check references with ScrapeGraph:
from scrapebiblio.utils.api.reference_utils import check_reference_with_scrapegraph
result = check_reference_with_scrapegraph("Reference Title")

Contributing

We welcome contributions! Please see our Contributing Guidelines for more details.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages