Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Parsing Issue #25

Open
SkSumit opened this issue Oct 10, 2021 · 6 comments
Open

Fix Parsing Issue #25

SkSumit opened this issue Oct 10, 2021 · 6 comments
Labels
enhancement New feature or request good first issue Good for newcomers hacktoberfest help wanted Extra attention is needed

Comments

@SkSumit
Copy link
Collaborator

SkSumit commented Oct 10, 2021

  • Parsing doesn't work for all resume format.
  • Wrong parsing information in fields like name, education.
@SkSumit SkSumit added enhancement New feature or request good first issue Good for newcomers hacktoberfest help wanted Extra attention is needed labels Oct 10, 2021
@the-lightstack
Copy link

How can I reproduce that error? (so what was your input pdf)

@SkSumit
Copy link
Collaborator Author

SkSumit commented Oct 10, 2021

Hi @the-lightstack ,
You can checkout the sample resume folders to test it out, for these resumes code does not parse information well.
For example this resume, outputs name as Frontend Intern instead of Sumit Kolpekwar, along with other issues.

@Yashdew
Copy link
Owner

Yashdew commented Oct 10, 2021

@the-lightstack for reference you can check the JSON output example in README.md and One more thing kindly PR in dev branch.

@the-lightstack
Copy link

This seems interesting, but I don't really know what the error is, so I won't fix this issue

@Yashdew
Copy link
Owner

Yashdew commented Oct 11, 2021

@the-lightstack well there is no issue but we want to increase the accuracy of the parsing algorithm.

For example,
In education:- nothing is coming in most of the resumes.
In experience:- It's can't differentiate between projects and experience.

File location:- https://github.com/Yashdew/Assessor/blob/main/Sample%20Resume/Yash-Dewangan-CV.pdf

At present we are getting this type of JSON from parsing:-

[
    {
        "personal_details": {
            "name": "Yash Dewangan",
            "email": "[email protected]",
            "mobile_number": "8602842290"
        },
        "skills": [
            "Github",
            "Architecture",
            "Programming",
            "Pandas",
            "Editing",
            "Database",
            "Analysis",
            "Design",
            "Apis",
            "Ui",
            "C",
            "Coding",
            "C++",
            "Video",
            "Engineering",
            "Information technology",
            "Api",
            "Algorithms",
            "Java",
            "Rest",
            "Statistics",
            "Flask",
            "Django",
            "Apex",
            "Sql server",
            "Js",
            "Photography",
            "Css",
            "Python",
            "Html",
            "Sql"
        ],
        "education": null,
        "experience": [
            "eQ Technologic  | Software Engineer Intern",
            "Aug 2021 – Present",
            "•  Implemented various services/APIs needed",
            "for new features required in latest release",
            "•  Learnt about SOA architecture, modular",
            "coding i.e. keeping future use in mind",
            "•  Implementation of concepts such as Tagging",
            "Entities, Groups/User Authorization &",
            "Permissions for Entities",
            "•  Worked on Backend technologies such as",
            "Spring and Java with SQL Server as",
            "Database"
        ],
        "no_of_pages": 1,
        "links": {
            "linkedin": "https://www.linkedin.com/in/iyashdewangan/",
            "leetcode": "https://leetcode.com/Yashdew/",
            "codechef": "https://www.codechef.com/users/yashdew",
            "codeforces": "http://codeforces.com/profile/yashdewangan123456",
            "github": [
                "https://github.com/Yashdew/Attendance-Tracker",
                "https://github.com/Yashdew",
                "https://github.com/SkSumit/Chatistics"
            ],
            "others": [
                "https://drive.google.com/file/d/1-UrtlUygeujyDXvZPhI5fW9E1wICL_Qd/view",
                "https://chatistics.vercel.app/",
                "https://auth.geeksforgeeks.org/user/yashdewangan123456/practice/",
                "https://attendancesknhc.herokuapp.com/",
                "mailto:[email protected]",
                "https://www.spoj.com/users/yashdew/"
            ]
        },
        "total_experience": 0.17,
        "projects": "Projects TBC",
        "achievements": "Achievements TBC",
        "hobbies": "Hobbies TBC"
    }
]

We want this type of JSON from parsing algorithm:-

[
    {
        "personal_details": {
            "name": "Yash Dewangan",
            "email": "[email protected]",
            "mobile_number": "8602842290"
        },
        "skills": [
            "Pandas",
            "Coding",
            "C",
            "Flask",
            "Css",
            "Java",
            "C++",
            "Django",
            "Rest",
        ],
        "education": [
            "SMT. KASHIBAI NAVALE COLLEGE OF ENGINEERING
            BE in Information Technology
            2018-2022 | Pune, MH
            Cum. GPA: 8.14",
        ],
        "experience": [
            "eQ Technologic | Software Engineer Intern
            Aug 2021 – Present
            Implemented various services/APIs needed for new features required in latest release
            Learnt about SOA architecture, modular coding i.e. keeping future use in mind
            Implementation of concepts such as Tagging Entities and  Groups/User Authorization & Permissions for Entities
            Worked on Backend technologies such as Spring and Java with SQL Server as Database"
        ],
        "no_of_pages": 1,
        "links": {
            "linkedin": "https://www.linkedin.com/in/iyashdewangan/",
            "leetcode": "https://leetcode.com/Yashdew/",
            "codechef": "https://www.codechef.com/users/yashdew",
            "codeforces": "http://codeforces.com/profile/yashdewangan123456",
            "github": [
                "https://github.com/Yashdew/Attendance-Tracker",
                "https://github.com/Yashdew",
                "https://github.com/SkSumit/Chatistics"
            ],
            "others": [
                "mailto:[email protected]",
                "https://www.spoj.com/users/yashdew/",
                "https://attendancesknhc.herokuapp.com/",
                "https://chatistics.vercel.app/",
                "https://auth.geeksforgeeks.org/user/yashdewangan123456/practice/"
            ]
        },
        "total_experience": 0.17,
        "projects": [
            "CHATISTICS
            GitHub Live URL
            Dec 2020 - Feb 2021
            An open-source WhatsApp chats analyser and statistics.
            Application, which provides various meaningful insights.
            Time complexity reduces from 20 seconds. to 5 seconds.
            Used Flask for implementing backend REST APIs with firebase database for analysis of traffic.
            Pandas for data pre-processing.
            Used NextJS and Bulma UI for frontend.
            500+ users and 30 stars on GitHub.",

            "ATTENDANCE-TRACKER
            GitHub Live URL
            July 2020 – Aug 2020
            A full-stack web application for monitoring the attendance in Microsoft Teams from logs file of the meeting. (Sample)
            Optimization of code took around 3 seconds in Data pre-processing.
            Worked on building the major backend part and frontend.
            Used Flask for implementing Backend and HTML, CSS & JS for frontend.
            Used Mongo DB and Google sheet API for Database.
            Data pre-processing of large logs files for calculating time stamps of students using pandas
            50+ users in our college."
        ],
        "achievements": [
            "Codechef - Maximum rating 1603 (3-star).",
            "Codechef – March Lunchtime 2021 Div-3, secured a rank of 825 out of 7000+ participants.",
            "Leetcode – 150+ Solved Questions.",
            "250+ Solved Questions on GFG, Codechef, SPOJ and Codeforces.",
            "Participated in Google kickstart 2021 Round A, Round C & Round D.",
            "Secured 1st rank out of 30+ participants in Scaler Edge Apex 2021. (SKN Edition)",
            "Represented Hack Club SKN projects in Hack Club Asia Summit 2021.",
            "Participated in more than 30+ coding competition."
        ],
        "hobbies": [
            "Photography and Video editing",
            "Traveling and exploring new places.",
            "Gaming"
        ]
    }
]

@AK9175
Copy link
Collaborator

AK9175 commented Oct 12, 2021

This seems interesting, but I don't really know what the error is, so I won't fix this issue

We used a module called pyresparser to extract information from resumes but unfortunately, we are missing out on few attributes like we want to get all the information regarding what projects, achievements a particular candidate has done
so now we are expecting that you can use some other known module or you can work out that on your own to extracts projects and achievements from the resume

- If working it out on your own

  1. You need to use some pdf to text module to get text from resumes.
  2. Use your Data preprocessing / extraction techniques to extract Experience , Projects ,Achievements for resumes with different structures(some resumes are divided horizontally and some are vertically or if any)

- If using some modules

  1. Use another module other than pyresparser to extract data which will accurately list out Experience , Projects and Achievements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers hacktoberfest help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants