Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nothing being pulled #76

Open
fredrikcoulter opened this issue Mar 14, 2024 · 29 comments
Open

Nothing being pulled #76

fredrikcoulter opened this issue Mar 14, 2024 · 29 comments

Comments

@fredrikcoulter
Copy link

fredrikcoulter commented Mar 14, 2024

A week after the last time I used Pull from Duolingo, I've finished another unit and it's time to pull the latest new cards.

I successfully logged in, then it started pulling vocabulary. However, I got an error message that says "Expected value: line 1 column 1 (char 0)".

I've tried a couple times, getting the same result. I also thought that maybe this was due to a lack of cards to download, so I deleted some cards I suspended. Even after the cards were deleted, I still got the same error. (Don't worry about my studying; these were duplicate words that Duolingo listed twice.)

Has Duolingo screwed around with their system again? Or am I just having a bad day?

@KeivanAbdi
Copy link

I can confirm that I have been having the same issue since last week.

@fredrikcoulter
Copy link
Author

I want to add that it appears to be a change on Duolingo's side. Duolingo Ninja (https://duolingoninja.com/) claims that something changed on March 9, which is right after I last successfully synced.

@JASchilz
Copy link
Owner

JASchilz commented Mar 16, 2024 via email

@JASchilz
Copy link
Owner

Alright! Thanks for reporting this, and I'm seeing the same things that you are.

Given the previous recent changes that made login difficult and now this change, I've removed the plugin from the Ankiweb plugin index. That said, I'll continue to try to maintain it at least for a while here in this repository for manual installation, if a solution comes up to this recent change. My feeling is that Duolingo is probably just not interested in maintaining a stable interface for third-party plugins and that that makes it a bit too unstable to support for the general Anki community.

It looks like the corresponding issue in the Duolingo library that I use in this plugin is KartikTalwar/Duolingo#139. I'll follow that to see if their community has a fix. Right now I'm kind of pessimistic that a fix will come out, but in any case it'll likely take a couple of weeks or more if one does come out. I'll continue to keep my eye out on this!

@JASchilz JASchilz pinned this issue Mar 18, 2024
@gigajuwels
Copy link

@JASchilz a fix has come #141 :)

@JASchilz
Copy link
Owner

Roger! Will take a look this weekend. :)

@JASchilz
Copy link
Owner

JASchilz commented Apr 1, 2024

Alright! I took a look. I'm still running into an error when I try integrating it, but my copy of the duolingo library has diverged from the Kartik's copy and so it seems likely the the cause is in that divergence.

I haven't had much time to debug this this weekend, but if you have this method working in the Kartik's project then it seems very likely that I can debug any issue in my copy without too much work, hopefully next weekend.

I'm doing my work in branch https://github.com/JASchilz/AnkiSyncDuolingo/tree/fix-from-gigajuwels if anyone else is wanting to take a look.

And thanks @gigajuwels both for the fix and for notifying us in this project. :)

@JASchilz
Copy link
Owner

JASchilz commented Apr 7, 2024

Hey folks! Just wanted to let you know that I'm continuing to address this issue. I'm able to use @gigajuwels fix to retrieve vocabulary, however it's not quite a one-to-one replacement, so it might be a couple of weeks before I get it integrated and this issue closed.

@tomtaylz
Copy link

tomtaylz commented Jul 5, 2024

Hey there - I gave a go at fixing this - python is definitely not my forte so I could use some feedback. I also have never used Anki or your plugin in its previous working state so it was a fair bit of guess work and not sure how migration would go.

Not sure if this is something you still have on your plate -- My solution mostly works (when I hardcode a few things it works great on my account) but I still need to figure out the proper user to pass login API I took from your gitlab and I am looking at how to provide/set the learnLanguage. I also added the audio we get from the new vocab API to the Target as an audio player - although there maybe a better way to do this.

Hoping to get these things polished up in a few days if anything for my personal use but happy to open a PR and get feedback too if it's helpful.

@JASchilz
Copy link
Owner

JASchilz commented Jul 7, 2024

Hey @tomtaylz , thanks for this and your work on moving this issue forward.

There's a couple of challenges here that I'm working with:

  1. One is that I don't use this stuff myself regularly, so there's a big spin up time. I put a few hours today into cleaning up my fork of the duolingo library, with the hopes of making it really easy to maintain with auto documentation and doctests. And I think I've now got a good handle on how to get the necessary information out.
  2. The second issue is that it doesn't look like there's any way to get the "gid" out of the vocabulary words--kind of a unique system identifier for each word--which I was using in this plugin to avoid importing the same word twice. I think I can come up with another new unique identifier, but as it will be different from the old one that might mean that there will be some challenge with duplicate cards when importing from whatever new version of the plugin I release the first time.

Gimme one more day to try to work this out, and then I might accept your help. :)

@JASchilz
Copy link
Owner

JASchilz commented Jul 7, 2024

OK folks, I've created a release candidate which I can use to pull my words from Duolingo. Check out https://github.com/JASchilz/AnkiSyncDuolingo/releases/tag/3.0.0rc1 and the installation instructions there.

Perhaps the biggest disruptive change, in response to Duolingo's own changes to their service in March, is that the plugin is unable to de-duplicate any cards pulled using a previous version of this plugin. I figure that if this is a hurdle to jump, maybe users can share any strategies here for how to address this.

Also note that I'm unable to retrieve any new words until after you've completed a lesson. In other words, the plugin won't retrieve words after the first time that you encounter them, it will only retrieve words from lessons that you've completed.

I'm also creating an issue to help set expectations about my ability to maintain this plugin and invite any new maintainer who would like to take it on: https://github.com/JASchilz/AnkiSyncDuolingo/issues .

@tomtaylz
Copy link

tomtaylz commented Jul 7, 2024

Thanks @JASchilz I'll cherry pick some of your work to get my local branch in a better state too. I'm happy to help at least in the short term but full disclosure around EOY there's a likelihood I'll drop off more too (aggressively chasing a language right now to prepare for a trip) - I could create a PR for some cleanup and for my audio additions for feedback.

@JASchilz
Copy link
Owner

JASchilz commented Jul 7, 2024

@tomtaylz seeing your changes would be appreciated.

I'd consider posting them more for posterity and not for hopes of getting them merged by me. In #78 I call out that I'm not able to give this project the full attention that I'd like to as owner, and invite a new maintainer. Having your code available could be interesting for that new maintainer, but I'm unlikely to be able to give it the diligence of a full review.

@tomtaylz
Copy link

tomtaylz commented Jul 7, 2024

@tomtaylz seeing your changes would be appreciated.

I'd consider posting them more for posterity and not for hopes of getting them merged by me. In #78 I call out that I'm not able to give this project the full attention that I'd like to as owner, and invite a new maintainer. Having your code available could be interesting for that new maintainer, but I'm unlikely to be able to give it the diligence of a full review.

Do I need to be added for permissions to write? On a new computer so just want to verify its not something wrong with my setup before I go down that rabbit hole but I hit a 403 :)

@JASchilz
Copy link
Owner

JASchilz commented Jul 7, 2024

@tomtaylz oh, how did you encounter the 403? I don't quite understand the scenario yet. :)

If you were asking about contributing your branch to the repository, I was thinking more along the lines of if you've got a fork on GitHub of this project you can link to it here so that it's available for other people to see in the future. :)

@tomtaylz
Copy link

tomtaylz commented Jul 7, 2024

@tomtaylz oh, how did you encounter the 403? I don't quite understand the scenario yet. :)

If you were asking about contributing your branch to the repository, I was thinking more along the lines of if you've got a fork on GitHub of this project you can link to it here so that it's available for other people to see in the future. :)

ah, I was just trying to push a branch and submit a PR for feedback, but happy to fork and take that approach also.

@tomtaylz
Copy link

tomtaylz commented Jul 8, 2024

Here's the addition of the audio player - #80

@BNaturelle
Copy link

BNaturelle commented Jul 9, 2024

I'm pretty new to API but I really loved this library before the change and want to help however I can

https://www.duolingo.com/2017-06-30/users/{user_id}/courses/{short_language}/en/practice-lexemes
works similarly to the POST "learned-lexemes" from the new get_vobabulary method. If you send in the skillId's one at a time, you can get the vocabulary of that specific unit. It also returns lexemeId's like the old API but only for the first 5 lexemes for some reason. They can also be grouped together with unit tags in the Anki deck since they all come from the same skill. It may be possible to get the lexemeId's with different arguments/json payloads, but I haven't found the golden ticket yet.
image

https://www.duolingo.com/2017-06-30/words-list/supported-courses
returns a json of duolingo courses offered in each language. Not useful on it's own, but there may be an accessible node hidden somewhere.

I hope someone smarter than me finds this information useful and helpful in restoring this library

@JASchilz
Copy link
Owner

JASchilz commented Jul 11, 2024 via email

@eriksolg
Copy link

@JASchilz Thanks a ton for the RC. This plugin is pretty much the only motivation for me to continue using Duolingo.

Hopefully somebody would volunteer for continuing maintaining it.

@BNaturelle
Copy link

BNaturelle commented Jul 11, 2024

3.0.0rc1 works fine, but I'm working to address some of the four "breaking changes" that you listed

The plugin has no means to avoid pulling words that you already pulled using a previous version of the addon
I'd really like to solve this to make it compatible with previous versions. I spent a lot of time adding details about gender, cases, and conjugation to my old Anki deck generated from this add-on. I think the only way this can be done is with lexemeIds from the hidden Duolingo dictionary (which still has yet to be found). I found a partial list from with "practice-lexemes" (mentioned above), which seems promising, but not a great solution.

I have no idea about the gender or pronounciation fields. We can hope that they'll be accesible from the same endpoint as the lexemeIds like before. Maybe that's just wishful thinking.

This will only pull learned words from completed lessons.
Fix for the "get_vocabulary" method in duolingo.py

current_courses = self._get_data_by_user_id()["currentCourse"]["pathSectioned"]
progressed_skills_Ids = []
progressed_skills = []

for section in current_courses:
    completedUnits = section["completedUnits"]
    units = section["units"]
    for unitIndex in range(len(units)):
        unit = units[unitIndex]
        if unitIndex>completedUnits:
            break
        levels = unit["levels"]
        for level in levels:
            if level['type'] != 'skill':
                continue
            pathLevelClientData = level["pathLevelClientData"]
            if "skillId" in pathLevelClientData:
                levelSkill = [pathLevelClientData['skillId']]
            elif "skillIds" in pathLevelClientData:
                levelSkill = pathLevelClientData["skillIds"]
            else:
                levelSkill = []
            for levelSkillId in levelSkill:
                if levelSkillId not in progressed_skills_Ids:
                    progressed_skills_Ids.append(levelSkillId)
                    if unitIndex < completedUnits:
                        finishedLevels = 1
                        finishedSessions = 1234
                    else:
                        finishedLevels = 0
                        finishedSessions = level["finishedSessions"]
                    new_obj = {
                        "finishedLevels": finishedLevels,
                        "finishedSessions": finishedSessions,
                        "skillId": {
                            "id": levelSkillId
                        }
                    }
                    progressed_skills.append(new_obj))

This will create a less bloated list of all progressed skills from partially and fully completed units.

@JASchilz
Copy link
Owner

Got it, @BNaturelle, your help is appreciated! Am I right in understanding that the flow would be:

  • Use your code snippet above to retrieve a list of partially and fully completed skills
  • Then use this list against both the practice-lexemes and the learned-lexemes endpoint, to retrieve both the fully learned words and the new words.

Regarding de-duplicating, I suspect that some other kind of solution might be necessary. For example, is there someone that has a plugin that allows you to merge cards somehow based on a field match? If so, it might be possible to pull down the new cards and merge the new GID values onto your old cards. Then the new version of the plugin would be able to de-duplicate them.

@BNaturelle
Copy link

You're partially right. The code snippet does indeed generate a list of partially and fully completed lexemes, and is intended to be used in the payload of a POST request to the learned-lexemes endpoint. It can also be used against the practice-lexemes endpoint, but it returns a vocab list that is much shorter and less useful than the previous endpoint.

The get_vocabulary method written by gigajuwels works fine, but it crawls lesson-type levels as well as practice-type levels (even though lessons are the only ones that introduce new skills), leading to repeated, unnecessary entries in the list. It also only crawls levels in completed units, ignoring completed levels in partially complete units. The snippet provided is a small improvement as it crawls only lesson-type levels, and includes levels in partially completed units.

Sorry if I'm suggesting changes here in an inconvenient way. I'm new to github/collaborative coding in general and don't know how to make forks or suggested changes yet. But, the snippet is specifically to replace lines 387-422 of duolingo.py in version 3.0.0rc1.

@JASchilz
Copy link
Owner

JASchilz commented Jul 15, 2024 via email

@dn6150
Copy link

dn6150 commented Jul 16, 2024

thx for your services

@JASchilz
Copy link
Owner

@BNaturelle I believe I've integrated your code into the latest release candidate: https://github.com/JASchilz/AnkiSyncDuolingo/releases/tag/3.0.0rc2 .

However, once I started developing on this I found that the same practice words were showing up on both the practice and learned endpoints. I didn't expect that and I'm not sure what the cause is, but I added some de-duplication code so that at least it wouldn't result in multiple notes for the same word.

You can take a look at the latest version of that code in this branch: https://github.com/JASchilz/AnkiSyncDuolingo/blob/fix-from-new-library/duolingo_sync/duolingo/duolingo.py#L375-L499

@BNaturelle
Copy link

BNaturelle commented Jul 22, 2024

This is expected. The practice-lexemes endpoint has several challenges that I havent addressed yet.

  1. It only returns a maximum of 29 lexemes and does not seems to support pagination or longer outputs (like the learned-lexemes does with ?startIndex= or ?limit=). Although it takes the same payload, they do not seem to take the new arguments. In fact, the POST to practice-lexemes on Duolingo.com uses no arguments at all
  2. It seems to return a random list of learned words from all completed lessons. This can produce a more useful list by parsing the progressed_Skills list and running it once for each skill/lesson, then concatenating the ouptuts. However, if there are more than 29 new words in that skill/lesson, it will return an incomplete list of words. This approach is also more than 7 times slower than learned-lexemes which returns a maximum of 200 words (as opposed to 29)

The practice-lexemes endpoint is less useful in almost every way. The only strength (and the reason I brought it up in the first place) is because it is the only reliable way of generating a word bank with lexemeIds like the old API did. It worked best as a unique identifier, and didn't have issues with homographs like the new "{word}-{lang}" identifiers do. Getting a full list of lexemeIds seems to be like a pipe dream at this point, mainly because of the unintuitive and undocumented nature of Duo's new API. To make matters worse, in new lessons (https://www.duolingo.com/2017-06-30/sessions) the lexemes are now called "kc"s with "kc_Ids" - and lexemIds are labelled "legacy_id"s of type "lex". I don't even know what kc means or why it's replacing lexemes, but I'm hoping that it means they're transforming their word bank's back end in some meaningful way.

I'll probably check back every month or two to see if Duo implemented anything new that can be scraped, but otherwise I don't see how to improve from here. I haven't been able to scrape gender, pronunciation, etc at all. Adding lesson tags is very doable, but maybe not something you want to bother implementing. I'd be willing to give it a try if that's something that interests you.

@JASchilz
Copy link
Owner

JASchilz commented Jul 22, 2024 via email

@BNaturelle
Copy link

Glad I could help. The unit/skill names are listed in many different subfolders, but I've found the following to be easiest to crawl:
self._get_data_by_user_id()["currentCourse"]["skills"][i][j]["name"] with iterable i,j. Good luck :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants