Skip to content

Commit

Permalink
Merge branch 'master' into cfbs
Browse files Browse the repository at this point in the history
  • Loading branch information
MariaHei committed Jun 10, 2024
2 parents 39f1710 + 98e5d6b commit e577ab8
Show file tree
Hide file tree
Showing 18 changed files with 673 additions and 127 deletions.
1 change: 1 addition & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ name: CI
on:
- push
- pull_request
- workflow_dispatch
jobs:
test:
name: Julia ${{ matrix.version }} - ${{ matrix.os }} - ${{ matrix.arch }} - ${{ github.event_name }}
Expand Down
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "JudiLing"
uuid = "b43a184b-0e9d-488b-813a-80fd5dbc9fd8"
authors = ["Xuefeng Luo", "Maria Heitmeier"]
version = "0.8.3"
version = "0.9.0"

[deps]
BSON = "fbb218c0-5317-5bc6-957e-2ee96dd4b1f0"
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# JudiLing

[![Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://MegamindHenry.github.io/JudiLing.jl/stable)
[![Dev](https://img.shields.io/badge/docs-dev-blue.svg)](https://MegamindHenry.github.io/JudiLing.jl/dev)
[![Build Status](https://github.com/MegamindHenry/JudiLing.jl/workflows/CI/badge.svg)](https://github.com/MegamindHenry/JudiLing.jl/actions)
[![Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://quantling.github.io/JudiLing.jl/stable)
[![Dev](https://img.shields.io/badge/docs-dev-blue.svg)](https://quantling.github.io/JudiLing.jl/dev)
[![Build Status](https://github.com/quantling/JudiLing.jl/workflows/CI/badge.svg)](https://github.com/quantling/JudiLing.jl/actions)
[![codecov](https://codecov.io/gh/MegamindHenry/JudiLing.jl/branch/master/graph/badge.svg)](https://codecov.io/gh/MegamindHenry/JudiLing.jl)

JudiLing: An implementation for Linear Discriminative Learning in Julia
Expand Down
10 changes: 6 additions & 4 deletions docs/src/index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
# JudiLing

!!! note
If you encounter an error like "ERROR: UndefVarError: DataFrame! not defined", this is because our dependency CSV.jl changed their APIs in v0.8. Please use "data = DataFrame(CSV.File(path_to_csv_file))" to read a CSV file and include DataFrames package by "using DataFrames".
JudiLing: An implementation for Linear Discriminative Learning in Julia

Maintainer: Maria Heitmeier [@MariaHei](https://github.com/MariaHei)
Original codebase: Xuefeng Luo [@MegamindHenry](https://github.com/MegamindHenry)

## Installation

Expand All @@ -12,11 +14,11 @@ Pkg.add("JudiLing")
```
For brave adventurers, install test version of JudiLing by:
```
julia> Pkg.add(url="https://github.com/MegamindHenry/JudiLing.jl.git")
julia> Pkg.add(url="https://github.com/quantling/JudiLing.jl.git")
```
Or from the Julia REPL, type `]` to enter the Pkg REPL mode and run
```
pkg> add https://github.com/MegamindHenry/JudiLing.jl.git
pkg> add https://github.com/quantling/JudiLing.jl.git
```

## Running Julia with multiple threads
Expand Down
23 changes: 23 additions & 0 deletions docs/src/man/input.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,27 @@ CurrentModule = JudiLing
load_dataset(filepath::String;
delim::String=",",
kargs...)
loading_data_randomly_split(
data_path::String,
output_dir_path::String,
data_prefix::String;
val_sample_size::Int = 0,
val_ratio::Float = 0.0,
random_seed::Int = 314)
loading_data_careful_split(
data_path::String,
data_prefix::String,
output_dir_path::String,
n_features_columns::Union{Vector{Symbol},Vector{String}};
train_sample_size::Int = 0,
val_sample_size::Int = 0,
val_ratio::Float64 = 0.0,
n_grams_target_col::Union{Symbol, String} = :Word,
n_grams_tokenized::Bool = false,
n_grams_sep_token::Union{Nothing, String} = nothing,
grams::Int = 3,
n_grams_keep_sep::Bool = false,
start_end_token::String = "#",
random_seed::Int = 314,
verbose::Bool = false)
```
1 change: 0 additions & 1 deletion docs/src/man/make_cue_matrix.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ CurrentModule = JudiLing
make_cue_matrix(data::DataFrame)
make_cue_matrix(data::DataFrame, cue_obj::Cue_Matrix_Struct)
make_cue_matrix(data_train::DataFrame, data_val::DataFrame)
make_cue_matrix(data::DataFrame, pyndl_weights::Pyndl_Weight_Struct)
make_combined_cue_matrix(data_train, data_val)
make_cue_matrix_from_CFBS(features::Vector{Vector{T}};
pad_val::T = 0.,
Expand Down
11 changes: 8 additions & 3 deletions docs/src/man/make_semantic_matrix.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,14 @@ CurrentModule = JudiLing
```@docs
PS_Matrix_Struct
make_pS_matrix
make_pS_matrix(utterances)
make_pS_matrix(utterances, utterances_train)
make_pS_matrix(data)
make_pS_matrix(data_val, pS_obj)
make_combined_pS_matrix(
data_train,
data_val;
features_col = :CommunicativeIntention,
sep_token = "_",
)
```

## Simulate semantic vectors
Expand All @@ -25,7 +31,6 @@ CurrentModule = JudiLing
make_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector)
make_S_matrix(data::DataFrame, base::Vector)
make_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector)
make_S_matrix(data_train::DataFrame, data_val::DataFrame, pyndl_weights::Pyndl_Weight_Struct, n_features_columns::Vector)
make_S_matrix(data_train::DataFrame, base::Vector, inflections::Vector, L::L_Matrix_Struct)
make_S_matrix(data_train::DataFrame, data_val::Union{DataFrame, Nothing}, base::Vector, L::L_Matrix_Struct)
make_S_matrix(data::DataFrame, base::Vector, L::L_Matrix_Struct)
Expand Down
54 changes: 51 additions & 3 deletions docs/src/man/pyndl.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,57 @@
CurrentModule = JudiLing
```

# Preprocess
JudiLing is able to call the python package [pyndl](https://github.com/quantling/pyndl) internally to compute NDL models. pyndl uses event files to compute the mapping matrices, which have to be generated manually or by using pyndl in Python, see documentation [here](https://pyndl.readthedocs.io/en/latest/#creating-grapheme-clusters-from-corpus-data).
The advantage of calling pyndl from JudiLing is that the resulting weights, cue and semantic matrices can be directly translated into JudiLing format and further processing can be done in JudiLing.

!!! note
For pyndl to be available in JudiLing, PyCall has to be imported before JudiLing:
```julia
using PyCall
using JudiLing
```

## Calling pyndl from JudiLing

```@docs
Pyndl_Weight_Struct
pyndl(data_path)
```
pyndl(
data_path::String;
alpha::Float64 = 0.1,
betas::Tuple{Float64,Float64} = (0.1, 0.1),
method::String = "openmp"
)
```

## Translating output of pyndl to cue and semantic matrices in JudiLing

With the weights in hand, the cue and semantic matrices can be computed:

```@docs
make_cue_matrix(
data::DataFrame,
pyndl_weights::Pyndl_Weight_Struct;
grams = 3,
target_col = "Words",
tokenized = false,
sep_token = nothing,
keep_sep = false,
start_end_token = "#",
verbose = false,
)
make_S_matrix(
data::DataFrame,
pyndl_weights::Pyndl_Weight_Struct,
n_features_columns::Vector;
tokenized::Bool=false,
sep_token::String="_"
)
make_S_matrix(
data_train::DataFrame,
data_val::DataFrame,
pyndl_weights::Pyndl_Weight_Struct,
n_features_columns::Vector;
tokenized::Bool=false,
sep_token::String="_"
)
```
17 changes: 14 additions & 3 deletions docs/src/man/wh.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,17 @@ CurrentModule = JudiLing
# Utils

```@docs
wh_learn(X, Y)
make_learn_seq(freq)
```
wh_learn(
X,
Y;
eta = 0.01,
n_epochs = 1,
weights = nothing,
learn_seq = nothing,
save_history = false,
history_cols = nothing,
history_rows = nothing,
verbose = false,
)
make_learn_seq(freq; random_seed = 314)
```
Loading

0 comments on commit e577ab8

Please sign in to comment.