The Data Studio

Hive: By Default Does Not Allow Order By Result Column Number

Tested Using Hortonworks Data Platform (HDP) Sandbox, Release 2.5, Hive 2.1.0.2.5.0.0-1245

This qualifies as a minor annoyance because there are two easy workarounds. If you have been using other databases, though, it will probably catch you out the first time, and when you forget.

if your write a query like this:

         select
             postal_code,
             count(*)
         from
             customer 
         group by
             postal_code
         order by
             2;
       
you would expect to see the output ordered by the count(*)which is number 2 in the list of result columns. That isn't what happens; the rows are returned in random order. You can find out why if you go back and search through the 250 lines of INFO messages that you get by default with Hive 2.1. Hopefully, I'm seeing this because I'm using the Technical Preview. Near the beginning of this spiel is the following warning:

      INFO  : Warning: Using constant number 2 in order by. If you try to use position alias when hive.groupby.orderby.position.alias is false, the position alias will be ignored.
    
There are two easy workarounds. The first is to set the hive.groupby.orderby.position.alias parameter to true. The second is to give the count(*) an alias, like this:

      select
          postal_code,
          count(*) as frequency
      from
          customer 
      group by
          postal_code
      order by
          frequency;
    

Then it works fine. Every little hinders.

I would favour setting the parameter, otherwise you, or one of your users, will stumble over this and waste time.

This issue is related to Hive Cannot Order By An Aggregate Expression.