[aur-dev] [PATCH 4/4] rpc: allow multiple args on info query

elij elij.mx at gmail.com
Tue Apr 12 02:17:43 EDT 2011


On Mon, Apr 11, 2011 at 11:11 PM, Dan McGee <dpmcgee at gmail.com> wrote:
> On Tue, Apr 12, 2011 at 1:04 AM, elij <elij.mx at gmail.com> wrote:
>> On Mon, Apr 11, 2011 at 10:15 PM, Dan McGee <dan at archlinux.org> wrote:
>>> The majority of "real world" info requests [1] come in hefty batches. We
>>> would be better served to handle these in one request rather than
>>> multiple by allowing AUR clients to send multiple arguments.
>>>
>>> This enables things like this to work:
>>>    http://aur.test/rpc.php?type=info&arg[]=cups-xerox&arg[]=cups-mc2430dl&arg[]=10673
>>
>> This kind of breaks the query format convention (thanks php!).
>> It might be cleaner to use a separator (like a space) to break up args
>> (and simply split on space).
>>
>>    http://aur.test/rpc.php?type=info&arg=cups-xerox+cups-mc1430dl+10673
>>
>> (note using plus signs here as an example since spaces or %20s are
>> harder to 'see')
>
> I know it breaks "convention", but introducing another seemed a bit
> silly to me. All old queries work 100% however, if that wasn't clear.
> If we want to expand this to msearch or search down the road (which
> seemed totally plausible to me when designing the code here), where a
> space has totally different meaning and you would not want to split on
> it, this new syntax would not work at all.

That makes sense. You have convinced me.
:)

>>> Note to RPC users: unfortunately due to the asinine design of PHP, you
>>> unfortunately have to use the 'arg[]' syntax if you want more than one
>>> query argument, or you will only get the package satisfying the last arg
>>> you pass.
>>>
>>> [1] Rough data from April 11, 2011, with a total hit count of 1,109,163:
>>>     12 /login.php
>>>     13 /rpc.php?type=sarch
>>>     15 /rpc.php?type=msearch
>>>     16 /pingserver.php
>>>     16 /rpc.php
>>>     22 /logout.php
>>>    163 /passreset.php
>>>    335 /account.php
>>>    530 /pkgsubmit.php
>>>    916 /rss2.php
>>>   3838 /index.php
>>>   6752 /rss.php
>>>   9699 /
>>>  42478 /rpc.php?type=search
>>>  184737 /packages.php
>>>  681725 /rpc.php?type=info
>>>
>>> That means a whopping 61.5% of our requests were for info over the RPC
>>> interface; package pages are a distant second at only 16.7%.
>>>
>>> Signed-off-by: Dan McGee <dan at archlinux.org>
>>> ---
>>>  web/lib/aurjson.class.php |   59 ++++++++++++++++++++++++++++++++++++--------
>>>  1 files changed, 48 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/web/lib/aurjson.class.php b/web/lib/aurjson.class.php
>>> index 9b0e1a0..57096d8 100644
>>> --- a/web/lib/aurjson.class.php
>>> +++ b/web/lib/aurjson.class.php
>>> @@ -101,6 +101,36 @@ class AurJSON {
>>>     }
>>>
>>>     /**
>>> +     * Parse the args to the info function. We may have a string or an array,
>>> +     * so do the appropriate thing. Within the elements, both * package IDs
>>> +     * and package names are valid; sort them into the relevant arrays and
>>> +     * escape/quote the names.
>>> +     * @param $args the arg string or array to parse.
>>> +     * @return mixed An array containing 'ids' and 'names'.
>>> +     **/
>>> +    private function parse_info_args($args) {
>>> +        if (!is_array($args)) {
>>> +            $args = array($args);
>>> +        }
>>> +
>>> +        $id_args = array();
>>> +        $name_args = array();
>>> +        foreach ($args as $arg) {
>>> +            if (!$arg) {
>>> +                continue;
>>> +            }
>>> +            if (is_numeric($arg)) {
>>> +                $id_args[] = intval($arg);
>>> +            } else {
>>> +                $escaped = mysql_real_escape_string($arg, $this->dbh);
>>> +                $name_args[] = "'" . $escaped . "'";
>>> +            }
>>> +        }
>>> +
>>> +        return array('ids' => $id_args, 'names' => $name_args);
>>> +    }
>>> +
>>> +    /**
>>>      * Performs a fulltext mysql search of the package database.
>>>      * @param $keyword_string A string of keywords to search with.
>>>      * @return mixed Returns an array of package matches.
>>> @@ -129,21 +159,28 @@ class AurJSON {
>>>      **/
>>>     private function info($pqdata) {
>>>         $fields = implode(',', self::$fields);
>>> +        $args = $this->parse_info_args($pqdata);
>>> +        $ids = $args['ids'];
>>> +        $names = $args['names'];
>>>
>>> -        $base_query = "SELECT {$fields} " .
>>> -            " FROM Packages WHERE ";
>>> +        if (!$ids && !$names) {
>>> +            return $this->json_error('Invalid query arguments');
>>> +        }
>>>
>>> -        if ( is_numeric($pqdata) ) {
>>> -            // just using sprintf to coerce the pqd to an int
>>> -            // should handle sql injection issues, since sprintf will
>>> -            // bork if not an int, or convert the string to a number 0
>>> -            $query_stub = "ID={$pqdata}";
>>> +        $query = "SELECT {$fields} " .
>>> +            " FROM Packages WHERE ";
>>> +        if ($ids) {
>>> +            $ids_value = implode(',', $args['ids']);
>>> +            $query .= "ID IN ({$ids_value})";
>>
>> I thought 'where in' queries in mysql were known to be terribly slow
>> compared to using 'where ='.
>
> Nope, at least not in this case- if things are indexed properly you
> have no problems, and we aren't using a subquery for the IN clause.
> mysql is at least smart enough to recognize the columns being filtered
> on are unique so it knows the row count is extremely limited before it
> even runs the query.

Oh, that's right. It was an IN with a *subquery* that was the pathological case.
Nice.

> mysql> explain select * from Packages where id in (10673, 6941) or
> Name in ('cups-xerox', 'cups-mc2430dl');
> +----+-------------+----------+-------------+---------------+--------------+---------+------+------+---------------------------------------------+
> | id | select_type | table    | type        | possible_keys | key
>    | key_len | ref  | rows | Extra
>   |
> +----+-------------+----------+-------------+---------------+--------------+---------+------+------+---------------------------------------------+
> |  1 | SIMPLE      | Packages | index_merge | PRIMARY,Name  |
> Name,PRIMARY | 66,4    | NULL |    4 | Using sort_union(Name,PRIMARY);
> Using where |
> +----+-------------+----------+-------------+---------------+--------------+---------+------+------+---------------------------------------------+
> 1 row in set (0.00 sec)
>


More information about the aur-dev mailing list