Skip to content

Commit

Permalink
Fix/issue 28 (#29) hello world Datalab
Browse files Browse the repository at this point in the history
* rephrase text

* remove individual authors

* make subtitle more visible

* fix notebook text

* Update CodeSnippet.tsx

* change notebook filename

* Update TermsOfUseModal.tsx

* Update CodeSnippet.tsx
  • Loading branch information
danieleguido authored Oct 16, 2024
1 parent cacc6d9 commit 6714dc0
Show file tree
Hide file tree
Showing 14 changed files with 77 additions and 33 deletions.
3 changes: 1 addition & 2 deletions src/components/CodeSnippet.tsx
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
import { useState, useRef, useEffect } from "react"
import ReactCodeMirror, { EditorView } from "@uiw/react-codemirror"
import type { ReactCodeMirrorRef } from "@uiw/react-codemirror"
import { duotoneDark } from "@uiw/codemirror-theme-duotone"
import { python } from "@codemirror/lang-python"
import { Copy, CheckCircle } from "iconoir-react"
import { createTheme } from "@uiw/codemirror-themes"
Expand All @@ -18,7 +17,7 @@ export interface CodeSnippetProps {
const myTheme = createTheme({
theme: "light",
settings: {
background: "#fff9f2",
background: "#fff9f250",
backgroundImage: "",
foreground: "#75baff",
caret: "#5d00ff",
Expand Down
2 changes: 1 addition & 1 deletion src/components/GettingStarted.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ const GettingStarted = ({ className = "" }) => {
<div className="badge bg-dark me-2 py-1 px-2 font-weight-extrabold text-primary">
{startNumAfterOptionalSteps}
</div>{" "}
Consult our terms of use
Accept our Terms of Use
</Link>
</li>
<li>
Expand Down
2 changes: 1 addition & 1 deletion src/components/TermsOfUseModal.tsx
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { useEffect, useRef, useState, type ChangeEvent } from "react"
import { useEffect, useState, type ChangeEvent } from "react"
import AcceptTermsOfUse from "./AcceptTermsOfUse"
import Page from "./Page"
import { Col, Container, Row } from "react-bootstrap"
Expand Down
2 changes: 1 addition & 1 deletion src/components/Wall.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ const Wall = ({
<b>{numberOfAuthors}</b> authors.
</p>
</Col>
<Col md={4}>
<Col md={6} lg={6} xxl={5}>
<h3>
Join us in this early stage of development and help us to improve
the platform.
Expand Down
14 changes: 8 additions & 6 deletions src/content/notebooks/detect-news-agency-with-impresso-model.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,39 +8,40 @@ date: 2024-09-18T10:11:47Z
googleColabUrl: https://colab.research.google.com/github/impresso/impresso-datalab-notebooks/blob/main/2-entity/NE_02_newsagencies.ipynb
authors:
- impresso-team
seealso:
- setup
---

{/* cell:0 cell_type:markdown */}
Delivering swift and reliable news since the 1830s and 1840s, news agencies have played a pivotal role both nationally and internationally. However, understanding their precise impact on shaping news content has remained somewhat elusive. Our goal is to illuminate this aspect by identifying news agencies within historical newspaper articles. Using data from newspapers in Switzerland and Luxembourg as part of the impresso project, we've trained our pipeline to recognize these entities.
Delivering swift and reliable news since the 1830s and 1840s, news agencies have played a pivotal role both nationally and internationally. However, understanding their precise impact on shaping news content has remained somewhat elusive. Our goal is to illuminate this aspect by identifying news agencies within historical newspaper articles. Using data from newspapers in Switzerland and Luxembourg as part of the impresso project, we've trained our pipeline to recognize these entities.

If you're here, you likely seek to detect news agency entities in your own text. This notebook will guide you through the process of setting up a workflow to identify specific newspaper or agency mentions within your text.

{/* cell:1 cell_type:markdown */}
Install necessary libraries (if not already installed) and
Install necessary libraries (if not already installed) and
download the necessary NLTK data.

{/* cell:2 cell_type:code */}

```python
!pip install python-dotenv
!pip install transformers
!pip install torch
```

{/* cell:3 cell_type:markdown */}
*Note: This notebook requires `HF_TOKEN` to be set in the environment variables. You can get your token by signing up on the [Hugging Face website](https://huggingface.co/join) and read more in the [official documentation](https://huggingface.co/docs/huggingface_hub/v0.20.2/en/quick-start#environment-variable). We use [dotenv](https://pypi.org/project/python-dotenv/) library to load the HF_TOKEN value from a local .env file*
_Note: This notebook requires `HF_TOKEN` to be set in the environment variables. You can get your token by signing up on the [Hugging Face website](https://huggingface.co/join) and read more in the [official documentation](https://huggingface.co/docs/huggingface_hub/v0.20.2/en/quick-start#environment-variable). We use [dotenv](https://pypi.org/project/python-dotenv/) library to load the HF_TOKEN value from a local .env file_

{/* cell:4 cell_type:code */}

```python
from dotenv import load_dotenv
load_dotenv() # take environment variables from .env.
```

{/* cell:5 cell_type:markdown */}
Now the fun part, this function will download the requried model and gives you the keys to successfullly detect news agencies in your text.
Now the fun part, this function will download the requried model and gives you the keys to successfullly detect news agencies in your text.

{/* cell:6 cell_type:code */}

```python
from transformers import is_torch_available
from transformers import pipeline
Expand All @@ -56,6 +57,7 @@ nlp = pipeline("newsagency-ner", model="impresso-project/bert-newsagency-ner-fr"
Run the example below to see how it works.

{/* cell:8 cell_type:code */}

```python
# Example
text = "Mon nom est François et j'habite à Paris. (Reuter)"
Expand Down
30 changes: 23 additions & 7 deletions src/content/notebooks/generic-entity-api.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
githubUrl: https://github.com/impresso/impresso-datalab-notebooks/blob/main/2-entity/generic-entity-api.ipynb
authors:
- impresso-team
- EmanuelaBoros
# - EmanuelaBoros
title: Detect Entities and Link them to Wikipedia and Wikidata in a Text through
the Impresso API
sha: 54802fcabc0e32a4a05a1b4f2761a54b9807b0c5
Expand All @@ -14,10 +14,13 @@ googleColabUrl: https://colab.research.google.com/github/impresso/impresso-datal
Named entities such as organizations, locations, persons, and temporal expressions play a crucial role in the comprehension and analysis of both historical and contemporary texts. The HIPE-2022 project focuses on named entity recognition and classification (NERC) and entity linking (EL) in multilingual historical documents.

### About HIPE-2022

HIPE-2022 involves processing diverse datasets from historical newspapers and classical commentaries, spanning approximately 200 years and multiple languages. The primary goal is to confront systems with challenges related to multilinguality, domain-specific entities, and varying annotation tag sets.

### Datasets

The HIPE-2022 datasets are based on six primary datasets, but this model was only trained on **hipe2020** in French and German.

- **ajmc**: Classical commentaries in German, French, and English.
- **hipe2020**: Historical newspapers in German, French, and English.
- **letemps**: Historical newspapers in French.
Expand All @@ -26,6 +29,7 @@ The HIPE-2022 datasets are based on six primary datasets, but this model was onl
- **sonar**: Historical newspapers in German.

### Annotation Types and Levels

HIPE-2022 employs an IOB tagging scheme (inside-outside-beginning format) for entity annotations. The annotation levels include:

1. **TOKEN**: The annotated token.
Expand All @@ -37,6 +41,7 @@ HIPE-2022 employs an IOB tagging scheme (inside-outside-beginning format) for en
7. **NE-NESTED**: Coarse type of the nested entity.

### Getting Started

This notebook will guide you through setting up a workflow to identify named entities within your text using the HIPE-2022 trained pipeline. By leveraging this pipeline, you can detect mentions of people, places, organizations, and temporal expressions, enhancing your analysis and understanding of historical and contemporary documents.

---
Expand All @@ -45,23 +50,25 @@ This updated description provides a clear overview of the HIPE-2022 project's go
*Note: This notebook *might* require `HF_TOKEN` to be set in the environment variables. You can get your token by signing up on the [Hugging Face website](https://huggingface.co/join) and read more in the [official documentation](https://huggingface.co/docs/huggingface_hub/v0.20.2/en/quick-start#environment-variable)*

{/* cell:1 cell_type:markdown */}
Install necessary libraries (if not already installed) and
Install necessary libraries (if not already installed) and
download the necessary NLTK data.

{/* cell:2 cell_type:code */}

```python
!pip install transformers
!pip install nltk
!pip install torch
```

{/* cell:3 cell_type:code */}

```python
def print_nicely(results, text):
# Print the timestamp and system ID
print(f"Timestamp: {results.get('ts')}")
print(f"System ID: {results.get('sys_id')}")

entities = results.get('nes', [])
if entities:
print(f"\n{'Entity':<20} {'Type':<15} {'Confidence NER':<15} {'Confidence NEL':<15} {'Start':<5} {'End':<5} {'Wikidata ID':<10} {'Wikipedia Page':<20}")
Expand All @@ -72,7 +79,7 @@ def print_nicely(results, text):
wkd_id = entity.get('wkd_id', 'N/A')
wkpedia_pagename = entity.get('wkpedia_pagename', 'N/A')
print(f"{entity['surface']:<20} {entity['type']:<15} {confidence_ner:<15} {confidence_nel:<15} {entity['lOffset']:<5} {entity['rOffset']:<5} {wkd_id:<10} {wkpedia_pagename:<20}")

print("*" * 100)
print('Testing offsets:')
print("*" * 100)
Expand All @@ -84,7 +91,7 @@ def print_nicely(results, text):
wkd_id = entity.get('wkd_id', 'N/A')
wkpedia_pagename = entity.get('wkpedia_pagename', 'N/A')
print(f"{text[entity['lOffset']:entity['rOffset']]:<20} {entity['type']:<15} {confidence_ner:<15} {confidence_nel:<15} {entity['lOffset']:<5} {entity['rOffset']:<5} {wkd_id:<10} {wkpedia_pagename:<20}")

print("*" * 100)
print('Testing offsets in the returned text:')
print("*" * 100)
Expand All @@ -96,14 +103,15 @@ def print_nicely(results, text):
wkd_id = entity.get('wkd_id', 'N/A')
wkpedia_pagename = entity.get('wkpedia_pagename', 'N/A')
print(f"{results['text'][entity['lOffset']:entity['rOffset']]:<20} {entity['type']:<15} {confidence_ner:<15} {confidence_nel:<15} {entity['lOffset']:<5} {entity['rOffset']:<5} {wkd_id:<10} {wkpedia_pagename:<20}")


```

{/* cell:4 cell_type:markdown */}
Now the fun part, this function will download the requried model and gives you the keys to successfullly detect entities in your text.
Now the fun part, this function will download the requried model and gives you the keys to successfullly detect entities in your text.

{/* cell:5 cell_type:code */}

```python
from utils import get_linked_entities
import requests
Expand All @@ -117,41 +125,49 @@ for sentence in sentences:
```

{/* cell:6 cell_type:code */}

```python

```

{/* cell:7 cell_type:code */}

```python

```

{/* cell:8 cell_type:code */}

```python

```

{/* cell:9 cell_type:code */}

```python

```

{/* cell:10 cell_type:code */}

```python

```

{/* cell:11 cell_type:code */}

```python

```

{/* cell:12 cell_type:code */}

```python

```

{/* cell:13 cell_type:code */}

```python

```
14 changes: 13 additions & 1 deletion src/content/notebooks/impresso-py-collections.mdx
Original file line number Diff line number Diff line change
@@ -1,67 +1,79 @@
---
githubUrl: https://github.com/impresso/impresso-py/blob/main/examples/notebooks/collections.ipynb
authors:
- RomanKalyakin
# - RomanKalyakin
- impresso-team
title: Search collections
sha: fbebc19629cfc008a085283e61c0669de326add9
date: 2024-09-18T15:04:39Z
googleColabUrl: https://colab.research.google.com/github/impresso/impresso-py/blob/main/examples/notebooks/collections.ipynb
---

{/* cell:0 cell_type:code */}

```python
from impresso import connect

impresso = connect()
```

{/* cell:1 cell_type:code */}

```python
result = impresso.collections.find()
result
```

{/* cell:2 cell_type:markdown */}

# Get collection

Get metadata of a colection by its ID.

{/* cell:3 cell_type:code */}

```python
result = impresso.collections.get("local-roka-tOrwrOG3")
result
```

{/* cell:4 cell_type:markdown */}

## Get collection items

Get items from a collection by its ID.

{/* cell:5 cell_type:code */}

```python
colection_id = result.raw["uid"]
items = impresso.collections.items(colection_id)
items
```

{/* cell:6 cell_type:markdown */}

## Remove items from collection

{/* cell:7 cell_type:code */}

```python
item_id = items.pydantic.data[0].uid
item_id
```

{/* cell:8 cell_type:code */}

```python
impresso.collections.remove_items(colection_id, [item_id])
```

{/* cell:9 cell_type:markdown */}

## Add items to collection

{/* cell:10 cell_type:code */}

```python
impresso.collections.add_items(colection_id, [item_id])
```
Loading

0 comments on commit 6714dc0

Please sign in to comment.