Table names for multi-part inserts #186

sundeepn · 2013-10-14T02:10:09Z

Fix to ensure tablenames for multi-insert/partitioned cached table get reflected on the shark UI.

…t reflected on the shark UI

harveyfeng · 2013-10-14T02:57:49Z

Hi Sundeep, the current Shark master doesn't include support for partitioned cached tables.
"insert into" commands that involve UnionRDDs in MemoryStoreSinkOperator are appends to a single, non-partitioned table.
It seems like this patch tracks how many sequential appends (i.e., "insert into"s) have been done to each table, but doesn't account for new RDDs created by interleaved "insert overwrite"s - those RDDs are assigned the table name.

sundeepn · 2013-10-14T17:42:02Z

Hi Harvey, The current patch is meant to allow users to track the storage/memory usage on Shark Storage UI per table as opposed to 'rdd_###'. Inserts/overwrites to the cached tables render the current Storage UI quite hard to follow.

It does not handle drop parititions and overwrites in any special way, but it does guarantee that each block of data is identified by a unique number and has the table name associated with it on the UI.

I am planning on submitting another patch once we have partition support that has naming conventions derived from hive's partition information.

AmplabJenkins · 2013-10-14T19:17:36Z

Can one of the admins verify this patch?

harveyfeng · 2013-10-16T01:58:24Z

Yeah, the storage UI is a bit confusing right now :(
Assigning unique IDs to RDDs created from "insert into" definitely helps, but is there a way to assign unique identifiers to RDDs created from "insert overwrite", and possibly distinguish between valid or invalid RDDs? For example, right now it seems like five "insert overwrite" commands will result in five RDDs displayed under the same (table) name.
One way might be to mark overwritten RDDs with something like "stale_table-name".

sundeepn · 2013-10-16T17:27:51Z

Based on hive's documentation, shouldn't the insert overwrite on table unpersist the existing RDDs? (partitions just unpersist the overwritten partitions). If this is the case, I can push a fix on that front.

harveyfeng · 2013-10-17T00:01:15Z

Yeah, that sounds good - created a ticket for that here: https://spark-project.atlassian.net/browse/SHARK-202.
Could you assign yourself to it? :)

sundeepn · 2013-10-17T00:35:37Z

Sure. I do not seem to have permissions to assign myself the ticket. If you can help with that, I will take on the ticket. :)

harveyfeng · 2013-10-17T01:07:26Z

Done - assigned it to you. Thx!

harveyfeng · 2013-10-17T01:11:01Z

Oh, it looks like the assignments were concurrent....

rxin · 2013-11-01T06:17:13Z

What's the status of this pr?

Fix to ensure tablenames for multi-insert/partitioned cached table ge…

2844c67

…t reflected on the shark UI

harveyfeng mentioned this pull request Oct 28, 2013

Handle database namespaces for cached tables #196

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Table names for multi-part inserts #186

Table names for multi-part inserts #186

sundeepn commented Oct 14, 2013

harveyfeng commented Oct 14, 2013

sundeepn commented Oct 14, 2013

AmplabJenkins commented Oct 14, 2013

harveyfeng commented Oct 16, 2013

sundeepn commented Oct 16, 2013

harveyfeng commented Oct 17, 2013

sundeepn commented Oct 17, 2013

harveyfeng commented Oct 17, 2013

harveyfeng commented Oct 17, 2013

rxin commented Nov 1, 2013

Table names for multi-part inserts #186

Are you sure you want to change the base?

Table names for multi-part inserts #186

Conversation

sundeepn commented Oct 14, 2013

harveyfeng commented Oct 14, 2013

sundeepn commented Oct 14, 2013

AmplabJenkins commented Oct 14, 2013

harveyfeng commented Oct 16, 2013

sundeepn commented Oct 16, 2013

harveyfeng commented Oct 17, 2013

sundeepn commented Oct 17, 2013

harveyfeng commented Oct 17, 2013

harveyfeng commented Oct 17, 2013

rxin commented Nov 1, 2013