HIVE-4963 [jira] Support in memory PTF partitions
AcceptedPublic

Press ? to show keyboard shortcuts.
Next Step
arc land 'HIVE-4963-2'
Author
hbutani
Reviewers
JIRA
ashutoshc
Lint
Lint OK
Unit
No Unit Test Coverage
Branch
HIVE-4963-2
Apply Patch
arc patch D12279
Arcanist Project
Restricted Arcanist Project
Subscribers
None
Projects
None
Summary

fix lint issues

PTF partitions apply the defensive mode of assuming that partitions will not fit in memory. Because of this there is a significant deserialization overhead when accessing elements.

Allow the user to specify that there is enough memory to hold partitions through a 'hive.ptf.partition.fits.in.mem' option.

Savings depends on partition size and in case of windowing the number of UDAFs and the window ranges. For eg for the following (admittedly extreme) case the PTFOperator exec times went from 39 secs to 8 secs.

select t, s, i, b, f, d,
min(t) over(partition by 1 rows between unbounded preceding and current row),
min(s) over(partition by 1 rows between unbounded preceding and current row),
min(i) over(partition by 1 rows between unbounded preceding and current row),
min(b) over(partition by 1 rows between unbounded preceding and current row)
from over10k

Test Plan

EMPTY

hbutani updated this revision.Via LegacyAug 20 2013, 1:53 AM
  • Merge remote-tracking branch 'origin' into HIVE-4963-2
  • update RowContainer based on template parameter change from Row to ROW
ashutoshc added a comment.Via LegacyAug 21 2013, 4:20 PM

Seems like there are more opportunities to make this efficient, but those can be digged into later. This patch is a step in a right direction by reusing existing infra. Any improvements we now make may benefit other spilling operators like join too. Really makes me happy : )
Apart from code comments, I will also request you to add a testcase which sets the config value (cachesize) to zero, so that it spills for every record and exercise all these new codepath.

ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java
57

this config should really govern how much memory we are willing to allocate (in bytes), not in number of rows, but thats a topic for another jira since you are reusing existing code.

89

This I think you need to do because current RowContainers can only hold crisp java objects. Seems like we can improve this by writing RowContainer which can hold writables, thus avoiding unnecessary deserialization and mem-cpy here. Something worth exploring as follow-up issue.

137

Instead of try-catch-rethrow, shall we just add throws in method signature, makes code readable and arguably faster.

148

This sanity check is in tight loop. Ideally we should not have such checks in inner loop. But lets leave it here till we get more confidence in the code. Will be good to add a note about what will be the assumption if we are to get rid of this check in future.

160

Similar comment about try-catch-rethrow.

ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/PTFRowContainer.java
80

Awesome comments!

94

If I get this right, this function will again do serialization before spilling, so in case of memory pressure, we are doing a round trip of ser-deser without performing useful work. This ties back to my earlier comment on eager deserialization.
This whole mechanism is worth exploring later.

hbutani updated this revision.Via LegacyAug 22 2013, 11:16 PM
  • Merge branch 'trunk' into HIVE-4963-2
  • changes based on review.
  • fix lint issues
ashutoshc accepted this revision.Via LegacyAug 24 2013, 5:30 AM

+1

Revision Update History

DiffIDBaseDescriptionCreatedLintUnit
BaseBase
Diff 1379831514119Aug 15 2013, 3:35 AM
Diff 2383911515254 - Merge remote-tracking branch 'origin' into HIVE-4963-2Aug 20 2013, 1:52 AM
Diff 3387451516180 - Merge branch 'trunk' into HIVE-4963-2Aug 22 2013, 11:15 PM

Local Commits

CommitTreeParentsAuthorSummaryDate
55e542d49d285581c1ff39bddbb9cc09c817Harish Butani
fix lint issues
Aug 22 2013, 11:14 PM
dbb9cc09c8171a24175522b3f86d3e3363bcHarish Butani
changes based on review.
Aug 22 2013, 11:02 PM
f86d3e3363bcd6437d7623a275a97f8dfcce
59cdb5a9a36d
Harish Butani
Merge branch 'trunk' into HIVE-4963-2
Aug 21 2013, 4:02 PM
75a97f8dfcce533f1b5ccdc52bceb05134eaHarish Butani
update RowContainer based on template parameter change from Row to ROW
Aug 20 2013, 1:38 AM
2bceb05134ea3d445bf6e3fac4635ccd6aaa
83fa88ae3216
Harish Butani
Merge remote-tracking branch 'origin' into HIVE-4963-2
Aug 19 2013, 10:24 PM
c4635ccd6aaa6be36f15aa4e6a3ea9a4c0a4Harish Butani
HIVE-4963 [jira] Support in memory PTF partitions (Show More…)
Aug 15 2013, 2:35 AM
6a3ea9a4c0a4f359670d0c29df2314b3a286
6dd2f0da9927
Harish Butani
HIVE-4963 [jira] Support in memory PTF partitions (Show More…)
Aug 15 2013, 2:27 AM
df2314b3a28691f20bb66780d74260966279Harish Butani
switch from BytebasedList to PTFRowContainer
Aug 15 2013, 2:26 AM

Diff 38745

ql/src/java/org/apache/hadoop/hive/ql/exec/PTFOperator.java

Loading...

ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPartition.java

Loading...

ql/src/java/org/apache/hadoop/hive/ql/exec/PTFPersistence.java

Loading...

ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/PTFRowContainer.java

Loading...

ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java

Loading...

ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java

Loading...

ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDesc.java

Loading...

ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java

Loading...

ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLeadLag.java

Loading...

ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/NPath.java

Loading...

ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionEvaluator.java

Loading...

ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/TableFunctionResolver.java

Loading...

ql/src/java/org/apache/hadoop/hive/ql/udf/ptf/WindowingTableFunction.java

Loading...

ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestPTFRowContainer.java

Loading...

ql/src/test/queries/clientpositive/windowing_adjust_rowcontainer_sz.q

Loading...

ql/src/test/results/clientpositive/windowing_adjust_rowcontainer_sz.q.out

Loading...

Add Comment