Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

window size #81

Open
Jundo26 opened this issue Apr 19, 2021 · 1 comment
Open

window size #81

Jundo26 opened this issue Apr 19, 2021 · 1 comment

Comments

@Jundo26
Copy link

Jundo26 commented Apr 19, 2021

why max window size == 500ms ?
Is it because the duration of a word is about 500ms?

@ljj7975
Copy link
Member

ljj7975 commented Apr 20, 2021

I think max_window_size can be a misleading name.
It is prefixed with max as samples can possibly have variable length (shorter than the max_window_size)

The window is the single unit of a sample that is preprocessed together and fed into the model
The following code should be self-explanatory

def infer(self, audio_data: torch.Tensor) -> bool:
sequence_present = False
for window in stride(audio_data, self.max_window_size_ms, self.eval_stride_size_ms, self.sample_rate):
if window.size(-1) < 1000:
break
self.ingest_frame(window.squeeze(0), curr_time=self.curr_time)
self.curr_time += self.eval_stride_size_ms
if self.sequence_present(self.curr_time):
sequence_present = True
break
return sequence_present
@torch.no_grad()
def ingest_frame(self, x: torch.Tensor, lengths: torch.Tensor = None, curr_time: float = None) -> int:
self.std = self.std.to(x.device)
if lengths is None:
lengths = torch.tensor([x.size(-1)]).to(x.device)
lengths = self.std.compute_lengths(lengths)
x = self.zmuv(self.std(x.unsqueeze(0)))
p = self.model(x, lengths).softmax(-1)[0].cpu().numpy()
p *= self.inference_weights
p = p / p.sum()
label = self._append_probability_frame(p, curr_time=curr_time)
return label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants