Skip to content
This repository has been archived by the owner on Oct 29, 2020. It is now read-only.

uncomment the "register" line in the pig script #4

Open
jendap opened this issue Aug 31, 2012 · 5 comments
Open

uncomment the "register" line in the pig script #4

jendap opened this issue Aug 31, 2012 · 5 comments

Comments

@jendap
Copy link

jendap commented Aug 31, 2012

Otherwise the ExtractSizes udf function is not found / resolved.

@traviscrawford
Copy link
Contributor

Hey jendap, thanks for taking a look at HDFS-DU!

The commented-out register line is intentional, actually, because this allows the unit-test to use that script directly, instead of having a copy somewhere that could get out of date. Also, when users run this I don't know where they will have copied the UDF jar, and that likely need to be set per-environment.

Can you just uncomment the line and set to whatever is an appropriate path for your environment?

@dvryaboy
Copy link

Hm. We could have a parametrized register, with a default value. The unit test would be able to reset that value, and we could tell users to set it I they move the jar -- that way the script doesn't need modification.

On Aug 31, 2012, at 8:41 AM, Travis Crawford [email protected] wrote:

Hey jendap, thanks for taking a look at HDFS-DU!

The commented-out register line is intentional, actually, because this allows the unit-test to use that script directly, instead of having a copy somewhere that could get out of date. Also, when users run this I don't know where they will have copied the UDF jar, and that likely need to be set per-environment.

Can you just uncomment the line and set to whatever is an appropriate path for your environment?


Reply to this email directly or view it on GitHub.

@traviscrawford
Copy link
Contributor

Since the unit test has the UDF class already on the classpath the register is not needed.

Any clue how Pig behaves if registering either a fake path, or no path at all?

@dvryaboy
Copy link

dvryaboy commented Sep 4, 2012

You can register '/dev/null', seems to work ok :).

D

On Fri, Aug 31, 2012 at 8:49 AM, Travis Crawford
[email protected]:

Since the unit test has the UDF class already on the classpath the
register is not needed.

Any clue how Pig behaves if registering either a fake path, or no path at
all?


Reply to this email directly or view it on GitHubhttps://github.com//issues/4#issuecomment-8196264.

@traviscrawford
Copy link
Contributor

This fails with ERROR 4002: Can't read file: /doesnotexist.pig

register /doesnotexist.pig;
a = load '/etc/hosts' using PigStorage();
dump a;

For now I'd like to keep this as-is, and ask users to uncomment that line, setting to whereever they put the UDF jar.

What would be an awesome pull request is removing the need for this Pig script entirely, instead adding an OfflineImageViewer-based tool that generates the dataset directly. This pig script was super useful in development when we didn't know what data we needed, but now that we know what dataset to produce we could simply dump it directly when parsing the fsimage.

Thoughts?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants