Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MS loader #181

Open
marziarivi opened this issue Jan 20, 2017 · 8 comments
Open

MS loader #181

marziarivi opened this issue Jan 20, 2017 · 8 comments

Comments

@marziarivi
Copy link
Contributor

Hi Simon,

I noticed that Montblanc reads the number of antennas from a MS as the number of rows of the ANTENNA table. However, sometimes some of these rows have flag = 1, i.e. must be neglected.
They can be removed from the MS using CASA. However, I just pointed it out in case you want to manage it in the code.

Cheers,
Marzia

@sjperkins
Copy link
Member

sjperkins commented Jan 20, 2017

Thanks for the headsup. We've been having a series of robust discussions about handling flagged or missing data. The new source/sink provider interface in the new version should make this fairly configurable for the user.

I guess the correct behaviour is to flag any baselines the antenna is involved with?

In general I am opposed to removing data from Measurement Sets these days as it makes fitting and reasoning portions of the problem onto GPUs/compute nodes difficult and introduces multiple code paths -- my attitude is that it's better to massage the MS by filling in (and flag missing data points).

/cc @JSKenyon for interests sake

@marziarivi
Copy link
Contributor Author

In my case those antennas are not used in the other data. But Montblanc assumed a number of antennas different from the real one, because it doesn't consider the flag column. I assume that the MS must be consistent with the actual number of antennas and the baselines, even if some baselines could be flagged as well...

@sjperkins
Copy link
Member

sjperkins commented Jan 20, 2017

In my case those antennas are not used in the other data. But Montblanc assumed a number of antennas different from the real one, because it doesn't consider the flag column. I assume that the MS must be consistent with the actual number of antennas and the baselines, even if some baselines could be flagged as well...

This is more relaxed in the new version in the sense that we don't require the following to hold strictly nbl == na*(na-1)/2 (and nbl == na*(na+1)/2 for autocorrelations) any more. So I don't this will be a problem going forward.

@sjperkins
Copy link
Member

Updated my previous comment.

@marziarivi
Copy link
Contributor Author

Yes I know because I am using v5. However there is a problem when the number of MS rows is different from ntime x nbl x nchan, which makes sense. So I am not sure this must be managed within the MS...

@sjperkins
Copy link
Member

Yes I know because I am using v5. However there is a problem when the number of MS rows is different from ntime x nbl x nchan, which makes sense. So I am not sure this must be managed within the MS...

We're still having a think about this. The two solutions we're thinking of are:

  1. Support it somehow in the SourceProviders (loaders)
  2. Provide a script to fill and flag the missing data points.

@JSKenyon
Copy link

In my opinion, the flagged antennas should not have all baselines associated with them removed from the MS. Instead, those rows should be flagged, preserving the ntime, nbl, nchan dimension which you mentioned. Unfortunately, dealing with information which has been removed in its entirety poses certain problems for performant code. As Simon mentioned, it is possible to handle this in the SourceProviders by ensuring the any omitted information is reintroduced and flagged.

@marziarivi I am not sure what you mean when you say "real" number of antennas. Are you referring to the number of unflagged antennas? To me, the true number of antennas is always the total number of antennas in the array, regardless of whether they are functioning properly or not.

@rdeane
Copy link

rdeane commented Jan 23, 2017

I would strongly support @sjperkins option number 2. This crops up all the time in other applications when we make the switch from using simulated data to real data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants