Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README.md #723

Closed
wants to merge 7 commits into from
Closed

Conversation

calzada
Copy link

@calzada calzada commented Aug 8, 2023

No description provided.

@calzada
Copy link
Author

calzada commented Aug 8, 2023

@matyaskopp @osenova @maciej-ogrodniczuk : PLEASE UPDATE BRANCH IF CONSIDERED APPROPRIATE,

Update ParlaMnt-ES.v-3.0 readme.md

@matyaskopp
Copy link
Collaborator

@calzada I quickly went through the README, which can be completed, and the obvious untruth can be fixed too. README inserting is done in the same way as the data, so please follow the contributing file guidelines CONTRIBUTING.md

Untruths:

Possible extension:

  • government members' acquisition
  • you can explain why the chairman's speeches are not affiliated with the exact person

If you want to include original ECPC XML it would probably be better to describe the conversion to this format first and next to describe the conversion to ParlaMint TEI:

  • a process of conversion
  • what has not been encoded (change party to group, constituencies of all MPs )
  • what has been newly added (government members, lingv. annotations, coalition/opposition,...)

@calzada
Copy link
Author

calzada commented Aug 8, 2023 via email

@calzada
Copy link
Author

calzada commented Aug 8, 2023 via email

@matyaskopp
Copy link
Collaborator

  • stanza was not used for annotations
  • Stanza was not used for annotations in ParlaMint-ES.v3.0 but it was used for ParlaMint-ES-2.1. At any rate,I was waiting to check when you finished the annotation. So I will now just say, it was annotated with UDPipe for ParlaMint.es-v.3.0

Not only UDPipe, but also NameTag. See:
https://github.com/matyaskopp/ParlaMint/blob/6fa360b0d7986319a93e3f801ecbe6ea3d880038/Data/ParlaMint-ES/ParlaMint-ES.ana.xml#L149-L158

         <appInfo>
            <application ident="UDPipe" version="2">
               <label>UDPipe 2 (spanish-ancora-ud-2.10-220711 model)</label>
               <desc xml:lang="en">POS tagging, lemmatization and dependency parsing done with UDPipe 2 (<ref target="http://ufal.mff.cuni.cz/udpipe/2">http://ufal.mff.cuni.cz/udpipe/2</ref>) with spanish-ancora-ud-2.10-220711 model</desc>
            </application>
            <application ident="NameTag" version="2">
               <label>NameTag 2 (spanish-conll-200831 model)</label>
               <desc>Name entity recognition done with NameTag 2 (<ref target="http://ufal.mff.cuni.cz/nametag/2">http://ufal.mff.cuni.cz/nametag/2</ref>) with spanish-conll-200831 model.</desc>
            </application>
         </appInfo>

And you can also insert lindat acknowledgements to fulfil the terms of use of lindat tools:

[The work described herein] has [also]* been using [data/tools/services]* provided by 
the LINDAT/CLARIAH-CZ Research Infrastructure (https://lindat.cz), supported by 
the Ministry of Education, Youth and Sports of the Czech Republic (Project No. LM2023062).
  • the conversion to TEI was done at the end of your pipeline, I am not aware of any other quality control.

YES MONICA REVISED OUR XML FILES TO
MAKE SURE CERTAIN MISTAKES WERE ERADICATED. IN FACT MONICA USED CHATGPT TO
THAT EFFECT.

That sounds interesting, it can be mentioned in the documentation and ideally supported by an example where chatgpt helps, and highlight that the final word has a human not AI, so you did not introduce more noise in the data.

  • government members' acquisition:

WHAT DO YOU MEAN BY THIS?

That the complete information about the members of the government is not present in CD format:

  • some government members are missing (those without speech definitely)
  • affiliation timespan is not present (the affiliation is known only at the time of speech)

I used wget to download wiki pages and a script for extracting information from html table to TEI: gov-wiki2tei.pl

  • what has not been encoded (change party to group, constituencies of
    all MPs ).

WELL, IT ISA SHAME BECAUSE WE DID HAVE ALL THIS INFORMATION, BUT
WE USED TOMAZ ERJAVECˋS CONVERSION SINCE TIME WAS TIGHT. NEXT VERSION WILL
INCLUDE THIS EASILY.

I am unsure if it is easy because you should also include the relationship between the party and the parliamentary group. The parliamentary groups also need a full definition (not only abbreviation).
We will see what you can do in next version.

  • what has been newly added (government members, lingv. annotations,
    coalition/opposition,...):

GOVERNMENT MEMBERS WERE ALREADY ADDED IN
PARLAMINT-ES-v-2.1// lingv annotations were added in
ParlaMInt-ES-2.1//COALITION/OPPOSITION was there in ParlaMint-ES-2.1.
SO THESE ARENOT NEW INFORMATION ITEMS???

Sorry I meant the difference between original ECPC XML and ParlaMint-ES
And I haven't seen government members in ParlaMint-ES 2.1. (no prime minister, no minister)

Final question. I do not have my Github DEsktop and this is why I am having
problems updating the documentation. I cannot work the way I normally work
since i have a tablet here. I have to update the document online. But then
I am forced to pull a request. Is this alright? Does anyone checks on my
request?

update read at the place where you did it now. I will insert it together with this pull request: #692

matyaskopp and others added 6 commits August 9, 2023 17:24
Final update of REadme.md
FInal + 1 update
Final and definite version of update (10-08-23)
Updating readme again. 10-08-23
Yet another refinement. 10-08-23.
@TomazErjavec
Copy link
Collaborator

Am closing this pull request, I think it is no longer relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants