Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Household ID needs to be integer? #3

Open
ppdewolf opened this issue Oct 9, 2019 · 4 comments
Open

Household ID needs to be integer? #3

ppdewolf opened this issue Oct 9, 2019 · 4 comments

Comments

@ppdewolf
Copy link

ppdewolf commented Oct 9, 2019

In my microdata the householdID is 12 digits (maximum).
When I apply record swapping, I get negative housholdIDs back.

Looks like householdID needs to be integer and thus any number > MAXINT (=2147483647) is mapped to a negative number? The maximum householdID I find in the output is exactly 2147483647.

Can this be changed to allow for larger number of digits?

@JohannesGuss
Copy link
Collaborator

Yes it needs to be integer right now. As of right now you supply the whole data set as std::vector< std::vector<int> > and just tell the procedure where householdID, geographic variables,... are in the data set, thus they all need to integers.

We could go from std::vector< std::vector<int> > to std::vector< std::vector<double> > ...I think that should work without too much trouble, but I would have to check.

Or we change the way the inputs work and supply inputs seperately. So we dont supply as hid the position of the column in data but the column vector itself. If we would go down this road it would make sense change this for other parameters too. And we would need to specify how the output should look like, since for the implementation as of right now we input the data set and simply return it with changed rows.

What do think would be the best option, also regarding possible changes for your JAVA frontend?
I think std::vector< std::vector<double> > might be the best option.

@mescudero84
Copy link

I have a problem with the version 0.2.0 that I hadn't in 0.1.0 related to integer values.
When I put

dat_swapped=recordSwap(data=dat,similar,hierarchy,

  •                    risk_variables,hid,k_anonymity,
    
  •                    swaprate,seed = 123456)
    

I get:
"Error in recordSwap(data = dat, similar = similar, hierarchy = hierarchy, :
data must contain only integer values at this point - this condition might get droped in a future release"
However, all my variables are integer.
Thank you

@JohannesGuss
Copy link
Collaborator

@mescudero84 should be fixed now, had a bug when checking the inputs.
Now the function checks if any column in data is non numeric or has any value containing a decimal part.
If you reinstall version 0.2.0 this should work now

@mescudero84
Copy link

Ok, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants