HIVE-2942 [jira] substr on string containing UTF-8 characters produces StringIndexOutOfBoundsException
AcceptedPublic

Press ? to show keyboard shortcuts.
Next Step
arc commit
Author
kevinwilfong
Reviewers
JIRA
njain
ashutoshc
navis
pauly
Lint
Lint OK
Unit
No Unit Test Coverage
Branch
svn
Apply Patch
arc patch D2727
Arcanist Project
Restricted Arcanist Project
Subscribers
None
Projects
None
Summary

https://issues.apache.org/jira/browse/HIVE-2942

Fixed UDFSubstr so that for strings, the substr now succeeds if there is a UTF-8 character by using the string length instead of the Text length.

Also, updated QTestUtil so that we can now write tests which include UTF-8 characters.

After HIVE-2792, the substr function produces a StringIndexOutOfBoundsException when called on a string containing UTF-8 characters without the length argument being present.

E.g.
select substr(str, 1) from table1;

now fails with that exception if str contains a UTF-8 character for any row in the table.

Test Plan

Added a testcase for UTF-8 characters to the substr testcases.

Verified the testcases for binary substrs continues to pass.

I am running the testcases now to make sure modifying QTestUtil doesn't produce any unexepected issues. Will update here if I find any.

kevinwilfong added a comment.Via LegacyApr 11 2012, 5:06 PM

Verified all the tests still pass.

pauly accepted this revision.Via LegacyApr 12 2012, 11:12 PM

+1

Revision Update History

DiffIDBaseDescriptionCreatedLintUnit
BaseBase
Diff 187571311913Apr 10 2012, 8:26 PM

Diff 8757

ql/src/java/org/apache/hadoop/hive/ql/udf/UDFSubstr.java

Loading...

ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java

Loading...

ql/src/test/queries/clientpositive/udf_substr.q

Loading...

ql/src/test/results/clientpositive/udf_substr.q.out

Loading...

Add Comment