Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: fileChunks is not iterable (cannot read property undefined) #11

Open
KarimovMurodilla opened this issue Dec 23, 2024 · 6 comments

Comments

@KarimovMurodilla
Copy link

I have .pdf files inside target folder. I am running this command to extract files data into knowledges.json:
npx folder2knowledge C:\Users\user\OneDrive\Desktop\misc\eliza\data

But i getting such error:
Error processing directory C:\Users\karim\OneDrive\Desktop\misc\eliza\data: TypeError: fileChunks is not iterable (cannot read property undefined)

How can i fix it?

@LinuxIsCool
Copy link

Getting the same error.

@robertcedwards
Copy link

I had to make some changes in order for it to work. I'll propose a PR and see if they accept it.

@Ramitphi
Copy link

@robertcedwards would be great if you can share some info about the changes you have done to make it work

@robertcedwards
Copy link

robertcedwards commented Dec 23, 2024

You bet @Ramitphi
I'm going to share my raw folder2knowledge.js script here…give the changes a go and let me know.
I'll def clone and do a merge/pull request, but hopefully this can help you now. Made a few other changes to ignore .DS_Store other hidden files


import pdf2md from '@opendocsg/pdf2md';
import dotenv from 'dotenv';
import fs from 'fs/promises';
import os from 'os';
import path from 'path';
import readline from 'readline';

dotenv.config();

// The first argument from the command line is the starting path
const startingPath = process.argv[2];

const tmpDir = path.join(os.homedir(), 'tmp', '.eliza');
const envPath = path.join(tmpDir, '.env');

// Ensure the tmp directory and .env file exist
const ensureTmpDirAndEnv = async () => {
  await fs.mkdir(tmpDir, { recursive: true });
  if (!await fs.access(envPath).then(() => true).catch(() => false)) {
    await fs.writeFile(envPath, '');
  }
};

const saveApiKey = async (apiKey) => {
  const envConfig = dotenv.parse(await fs.readFile(envPath, 'utf-8'));
  envConfig.OPENAI_API_KEY = apiKey;
  await fs.writeFile(envPath, Object.entries(envConfig).map(([key, value]) => `${key}=${value}`).join('\n'));
};

const loadApiKey = async () => {
  const envConfig = dotenv.parse(await fs.readFile(envPath, 'utf-8'));
  return envConfig.OPENAI_API_KEY;
};

const validateApiKey = (apiKey) => {
  return apiKey && apiKey.trim().startsWith('sk-');
};

const promptForApiKey = () => {
  const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout
  });

  return new Promise((resolve) => {
    rl.question('Enter your OpenAI API key: ', (answer) => {
      rl.close();
      resolve(answer);
    });
  });
};

const getApiKey = async () => {
  // Check process.env first
  if (validateApiKey(process.env.OPENAI_API_KEY)) {
    return process.env.OPENAI_API_KEY;
  }

  // Check cache in tmpdir
  const cachedKey = await loadApiKey();
  if (validateApiKey(cachedKey)) {
    return cachedKey;
  }

  // Prompt user if no valid key found
  const newKey = await promptForApiKey();
  if (validateApiKey(newKey)) {
    await saveApiKey(newKey);
    return newKey;
  } else {
    console.error('Invalid API key provided. Exiting.');
    process.exit(1);
  }
};

const processDocument = async (filePath) => {
  console.log(`Processing file: ${filePath}`);

  let content;
  const fileExtension = path.extname(filePath).toLowerCase();

  try {
    if (fileExtension === '.pdf') {
      const buffer = await fs.readFile(filePath);
      const uint8Array = new Uint8Array(buffer);
      content = await pdf2md(uint8Array);
    } else {
      content = await fs.readFile(filePath, 'utf8');
    }

    // Split content into chunks (e.g., by paragraphs)
    const chunks = content.split(/\n\n+/).filter(chunk => chunk.trim());

    return {
      document: {
        path: filePath,
        content: content,
        type: fileExtension.slice(1) || 'txt'
      },
      chunks: chunks.map(chunk => ({
        content: chunk.trim(),
        source: filePath
      }))
    };
  } catch (error) {
    console.error(`Error processing file ${filePath}:`, error);
    return {
      document: null,
      chunks: []
    };
  }
};

// Asynchronous function to recursively find files and process them
const findAndProcessFiles = async (dirPath) => {
  try {
    const filesAndDirectories = await fs.readdir(dirPath, {
      withFileTypes: true,
    });

    const documents = [];
    const chunks = [];

    for (const dirent of filesAndDirectories) {
      // Skip .DS_Store and other hidden files
      if (dirent.name.startsWith('.')) {
        continue;
      }

      const fullPath = path.join(dirPath, dirent.name);

      if (dirent.isDirectory()) {
        const { docs, chks } = await findAndProcessFiles(fullPath);
        documents.push(...docs);
        chunks.push(...chks);
      } else if (dirent.isFile()) {
        const result = await processDocument(fullPath);
        if (result.document) {
          documents.push(result.document);
        }
        if (result.chunks && Array.isArray(result.chunks)) {
          chunks.push(...result.chunks);
        }
      }
    }

    return { docs: documents, chks: chunks };
  } catch (error) {
    console.error(`Error processing directory ${dirPath}: ${error}`);
    return { docs: [], chks: [] };
  }
};

const promptForPath = () => {
  const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout
  });

  return new Promise((resolve) => {
    rl.question('Please enter a starting path: ', (answer) => {
      rl.close();
      resolve(answer);
    });
  });
};

// Main function to kick off the script
const main = async () => {
  try {
    await ensureTmpDirAndEnv();
    const apiKey = await getApiKey();
    process.env.OPENAI_API_KEY = apiKey;

    let path = startingPath;

    if (!path) {
      path = await promptForPath();
    }

    if (!path) {
      console.log('No starting path provided. Exiting.');
      return;
    }

    console.log(`Searching for files in: ${path}`);
    const { docs, chks } = await findAndProcessFiles(path);

    const output = {
      documents: docs,
      chunks: chks
    };

    // Save the output to knowledge.json
    await fs.writeFile('knowledge.json', JSON.stringify(output, null, 2));

    console.log('Done processing files and saved memories to knowledge.json.');
  } catch (error) {
    console.error('Error during script execution:', error);
    process.exit(1);
  }
};

// Execute the main function
main();

@Ramitphi
Copy link

thanks for the revert @robertcedwards
it did worked for me, but when I copied the character json created to add in my character.ts file it is throwing error

Screenshot 2024-12-25 at 4 21 38 AM

@robertcedwards
Copy link

I ended up using the character generator in order make the knowledge work with the character file. Definitely a workaround/not best practices.
https://elizagen.howieduhzit.best/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants