Fix Parsing Issue #25

SkSumit · 2021-10-10T07:57:22Z

Parsing doesn't work for all resume format.
Wrong parsing information in fields like name, education.

the-lightstack · 2021-10-10T11:22:39Z

How can I reproduce that error? (so what was your input pdf)

SkSumit · 2021-10-10T12:23:47Z

Hi @the-lightstack ,
You can checkout the sample resume folders to test it out, for these resumes code does not parse information well.
For example this resume, outputs name as Frontend Intern instead of Sumit Kolpekwar, along with other issues.

Yashdew · 2021-10-10T12:38:46Z

@the-lightstack for reference you can check the JSON output example in README.md and One more thing kindly PR in dev branch.

the-lightstack · 2021-10-11T11:06:47Z

This seems interesting, but I don't really know what the error is, so I won't fix this issue

Yashdew · 2021-10-11T12:29:57Z

@the-lightstack well there is no issue but we want to increase the accuracy of the parsing algorithm.

For example,
In education:- nothing is coming in most of the resumes.
In experience:- It's can't differentiate between projects and experience.

File location:- https://github.com/Yashdew/Assessor/blob/main/Sample%20Resume/Yash-Dewangan-CV.pdf

At present we are getting this type of JSON from parsing:-

[
    {
        "personal_details": {
            "name": "Yash Dewangan",
            "email": "[email protected]",
            "mobile_number": "8602842290"
        },
        "skills": [
            "Github",
            "Architecture",
            "Programming",
            "Pandas",
            "Editing",
            "Database",
            "Analysis",
            "Design",
            "Apis",
            "Ui",
            "C",
            "Coding",
            "C++",
            "Video",
            "Engineering",
            "Information technology",
            "Api",
            "Algorithms",
            "Java",
            "Rest",
            "Statistics",
            "Flask",
            "Django",
            "Apex",
            "Sql server",
            "Js",
            "Photography",
            "Css",
            "Python",
            "Html",
            "Sql"
        ],
        "education": null,
        "experience": [
            "eQ Technologic  | Software Engineer Intern",
            "Aug 2021 – Present",
            "•  Implemented various services/APIs needed",
            "for new features required in latest release",
            "•  Learnt about SOA architecture, modular",
            "coding i.e. keeping future use in mind",
            "•  Implementation of concepts such as Tagging",
            "Entities, Groups/User Authorization &",
            "Permissions for Entities",
            "•  Worked on Backend technologies such as",
            "Spring and Java with SQL Server as",
            "Database"
        ],
        "no_of_pages": 1,
        "links": {
            "linkedin": "https://www.linkedin.com/in/iyashdewangan/",
            "leetcode": "https://leetcode.com/Yashdew/",
            "codechef": "https://www.codechef.com/users/yashdew",
            "codeforces": "http://codeforces.com/profile/yashdewangan123456",
            "github": [
                "https://github.com/Yashdew/Attendance-Tracker",
                "https://github.com/Yashdew",
                "https://github.com/SkSumit/Chatistics"
            ],
            "others": [
                "https://drive.google.com/file/d/1-UrtlUygeujyDXvZPhI5fW9E1wICL_Qd/view",
                "https://chatistics.vercel.app/",
                "https://auth.geeksforgeeks.org/user/yashdewangan123456/practice/",
                "https://attendancesknhc.herokuapp.com/",
                "mailto:[email protected]",
                "https://www.spoj.com/users/yashdew/"
            ]
        },
        "total_experience": 0.17,
        "projects": "Projects TBC",
        "achievements": "Achievements TBC",
        "hobbies": "Hobbies TBC"
    }
]

We want this type of JSON from parsing algorithm:-

[
    {
        "personal_details": {
            "name": "Yash Dewangan",
            "email": "[email protected]",
            "mobile_number": "8602842290"
        },
        "skills": [
            "Pandas",
            "Coding",
            "C",
            "Flask",
            "Css",
            "Java",
            "C++",
            "Django",
            "Rest",
        ],
        "education": [
            "SMT. KASHIBAI NAVALE COLLEGE OF ENGINEERING
            BE in Information Technology
            2018-2022 | Pune, MH
            Cum. GPA: 8.14",
        ],
        "experience": [
            "eQ Technologic | Software Engineer Intern
            Aug 2021 – Present
            Implemented various services/APIs needed for new features required in latest release
            Learnt about SOA architecture, modular coding i.e. keeping future use in mind
            Implementation of concepts such as Tagging Entities and  Groups/User Authorization & Permissions for Entities
            Worked on Backend technologies such as Spring and Java with SQL Server as Database"
        ],
        "no_of_pages": 1,
        "links": {
            "linkedin": "https://www.linkedin.com/in/iyashdewangan/",
            "leetcode": "https://leetcode.com/Yashdew/",
            "codechef": "https://www.codechef.com/users/yashdew",
            "codeforces": "http://codeforces.com/profile/yashdewangan123456",
            "github": [
                "https://github.com/Yashdew/Attendance-Tracker",
                "https://github.com/Yashdew",
                "https://github.com/SkSumit/Chatistics"
            ],
            "others": [
                "mailto:[email protected]",
                "https://www.spoj.com/users/yashdew/",
                "https://attendancesknhc.herokuapp.com/",
                "https://chatistics.vercel.app/",
                "https://auth.geeksforgeeks.org/user/yashdewangan123456/practice/"
            ]
        },
        "total_experience": 0.17,
        "projects": [
            "CHATISTICS
            GitHub Live URL
            Dec 2020 - Feb 2021
            An open-source WhatsApp chats analyser and statistics.
            Application, which provides various meaningful insights.
            Time complexity reduces from 20 seconds. to 5 seconds.
            Used Flask for implementing backend REST APIs with firebase database for analysis of traffic.
            Pandas for data pre-processing.
            Used NextJS and Bulma UI for frontend.
            500+ users and 30 stars on GitHub.",

            "ATTENDANCE-TRACKER
            GitHub Live URL
            July 2020 – Aug 2020
            A full-stack web application for monitoring the attendance in Microsoft Teams from logs file of the meeting. (Sample)
            Optimization of code took around 3 seconds in Data pre-processing.
            Worked on building the major backend part and frontend.
            Used Flask for implementing Backend and HTML, CSS & JS for frontend.
            Used Mongo DB and Google sheet API for Database.
            Data pre-processing of large logs files for calculating time stamps of students using pandas
            50+ users in our college."
        ],
        "achievements": [
            "Codechef - Maximum rating 1603 (3-star).",
            "Codechef – March Lunchtime 2021 Div-3, secured a rank of 825 out of 7000+ participants.",
            "Leetcode – 150+ Solved Questions.",
            "250+ Solved Questions on GFG, Codechef, SPOJ and Codeforces.",
            "Participated in Google kickstart 2021 Round A, Round C & Round D.",
            "Secured 1st rank out of 30+ participants in Scaler Edge Apex 2021. (SKN Edition)",
            "Represented Hack Club SKN projects in Hack Club Asia Summit 2021.",
            "Participated in more than 30+ coding competition."
        ],
        "hobbies": [
            "Photography and Video editing",
            "Traveling and exploring new places.",
            "Gaming"
        ]
    }
]

AK9175 · 2021-10-12T04:55:04Z

This seems interesting, but I don't really know what the error is, so I won't fix this issue

We used a module called pyresparser to extract information from resumes but unfortunately, we are missing out on few attributes like we want to get all the information regarding what projects, achievements a particular candidate has done
so now we are expecting that you can use some other known module or you can work out that on your own to extracts projects and achievements from the resume

- If working it out on your own

You need to use some pdf to text module to get text from resumes.
Use your Data preprocessing / extraction techniques to extract Experience , Projects ,Achievements for resumes with different structures(some resumes are divided horizontally and some are vertically or if any)

- If using some modules

Use another module other than pyresparser to extract data which will accurately list out Experience , Projects and Achievements.

SkSumit added enhancement New feature or request good first issue Good for newcomers hacktoberfest help wanted Extra attention is needed labels Oct 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Parsing Issue #25

Fix Parsing Issue #25

SkSumit commented Oct 10, 2021

the-lightstack commented Oct 10, 2021

SkSumit commented Oct 10, 2021

Yashdew commented Oct 10, 2021

the-lightstack commented Oct 11, 2021

Yashdew commented Oct 11, 2021 •

edited

Loading

AK9175 commented Oct 12, 2021

Fix Parsing Issue #25

Fix Parsing Issue #25

Comments

SkSumit commented Oct 10, 2021

the-lightstack commented Oct 10, 2021

SkSumit commented Oct 10, 2021

Yashdew commented Oct 10, 2021

the-lightstack commented Oct 11, 2021

Yashdew commented Oct 11, 2021 • edited Loading

AK9175 commented Oct 12, 2021

Yashdew commented Oct 11, 2021 •

edited

Loading