[aur-dev] [PATCH 4/4] rpc: allow multiple args on info query
elij
elij.mx at gmail.com
Tue Apr 12 02:17:43 EDT 2011
On Mon, Apr 11, 2011 at 11:11 PM, Dan McGee <dpmcgee at gmail.com> wrote:
> On Tue, Apr 12, 2011 at 1:04 AM, elij <elij.mx at gmail.com> wrote:
>> On Mon, Apr 11, 2011 at 10:15 PM, Dan McGee <dan at archlinux.org> wrote:
>>> The majority of "real world" info requests [1] come in hefty batches. We
>>> would be better served to handle these in one request rather than
>>> multiple by allowing AUR clients to send multiple arguments.
>>>
>>> This enables things like this to work:
>>> http://aur.test/rpc.php?type=info&arg[]=cups-xerox&arg[]=cups-mc2430dl&arg[]=10673
>>
>> This kind of breaks the query format convention (thanks php!).
>> It might be cleaner to use a separator (like a space) to break up args
>> (and simply split on space).
>>
>> http://aur.test/rpc.php?type=info&arg=cups-xerox+cups-mc1430dl+10673
>>
>> (note using plus signs here as an example since spaces or %20s are
>> harder to 'see')
>
> I know it breaks "convention", but introducing another seemed a bit
> silly to me. All old queries work 100% however, if that wasn't clear.
> If we want to expand this to msearch or search down the road (which
> seemed totally plausible to me when designing the code here), where a
> space has totally different meaning and you would not want to split on
> it, this new syntax would not work at all.
That makes sense. You have convinced me.
:)
>>> Note to RPC users: unfortunately due to the asinine design of PHP, you
>>> unfortunately have to use the 'arg[]' syntax if you want more than one
>>> query argument, or you will only get the package satisfying the last arg
>>> you pass.
>>>
>>> [1] Rough data from April 11, 2011, with a total hit count of 1,109,163:
>>> 12 /login.php
>>> 13 /rpc.php?type=sarch
>>> 15 /rpc.php?type=msearch
>>> 16 /pingserver.php
>>> 16 /rpc.php
>>> 22 /logout.php
>>> 163 /passreset.php
>>> 335 /account.php
>>> 530 /pkgsubmit.php
>>> 916 /rss2.php
>>> 3838 /index.php
>>> 6752 /rss.php
>>> 9699 /
>>> 42478 /rpc.php?type=search
>>> 184737 /packages.php
>>> 681725 /rpc.php?type=info
>>>
>>> That means a whopping 61.5% of our requests were for info over the RPC
>>> interface; package pages are a distant second at only 16.7%.
>>>
>>> Signed-off-by: Dan McGee <dan at archlinux.org>
>>> ---
>>> web/lib/aurjson.class.php | 59 ++++++++++++++++++++++++++++++++++++--------
>>> 1 files changed, 48 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/web/lib/aurjson.class.php b/web/lib/aurjson.class.php
>>> index 9b0e1a0..57096d8 100644
>>> --- a/web/lib/aurjson.class.php
>>> +++ b/web/lib/aurjson.class.php
>>> @@ -101,6 +101,36 @@ class AurJSON {
>>> }
>>>
>>> /**
>>> + * Parse the args to the info function. We may have a string or an array,
>>> + * so do the appropriate thing. Within the elements, both * package IDs
>>> + * and package names are valid; sort them into the relevant arrays and
>>> + * escape/quote the names.
>>> + * @param $args the arg string or array to parse.
>>> + * @return mixed An array containing 'ids' and 'names'.
>>> + **/
>>> + private function parse_info_args($args) {
>>> + if (!is_array($args)) {
>>> + $args = array($args);
>>> + }
>>> +
>>> + $id_args = array();
>>> + $name_args = array();
>>> + foreach ($args as $arg) {
>>> + if (!$arg) {
>>> + continue;
>>> + }
>>> + if (is_numeric($arg)) {
>>> + $id_args[] = intval($arg);
>>> + } else {
>>> + $escaped = mysql_real_escape_string($arg, $this->dbh);
>>> + $name_args[] = "'" . $escaped . "'";
>>> + }
>>> + }
>>> +
>>> + return array('ids' => $id_args, 'names' => $name_args);
>>> + }
>>> +
>>> + /**
>>> * Performs a fulltext mysql search of the package database.
>>> * @param $keyword_string A string of keywords to search with.
>>> * @return mixed Returns an array of package matches.
>>> @@ -129,21 +159,28 @@ class AurJSON {
>>> **/
>>> private function info($pqdata) {
>>> $fields = implode(',', self::$fields);
>>> + $args = $this->parse_info_args($pqdata);
>>> + $ids = $args['ids'];
>>> + $names = $args['names'];
>>>
>>> - $base_query = "SELECT {$fields} " .
>>> - " FROM Packages WHERE ";
>>> + if (!$ids && !$names) {
>>> + return $this->json_error('Invalid query arguments');
>>> + }
>>>
>>> - if ( is_numeric($pqdata) ) {
>>> - // just using sprintf to coerce the pqd to an int
>>> - // should handle sql injection issues, since sprintf will
>>> - // bork if not an int, or convert the string to a number 0
>>> - $query_stub = "ID={$pqdata}";
>>> + $query = "SELECT {$fields} " .
>>> + " FROM Packages WHERE ";
>>> + if ($ids) {
>>> + $ids_value = implode(',', $args['ids']);
>>> + $query .= "ID IN ({$ids_value})";
>>
>> I thought 'where in' queries in mysql were known to be terribly slow
>> compared to using 'where ='.
>
> Nope, at least not in this case- if things are indexed properly you
> have no problems, and we aren't using a subquery for the IN clause.
> mysql is at least smart enough to recognize the columns being filtered
> on are unique so it knows the row count is extremely limited before it
> even runs the query.
Oh, that's right. It was an IN with a *subquery* that was the pathological case.
Nice.
> mysql> explain select * from Packages where id in (10673, 6941) or
> Name in ('cups-xerox', 'cups-mc2430dl');
> +----+-------------+----------+-------------+---------------+--------------+---------+------+------+---------------------------------------------+
> | id | select_type | table | type | possible_keys | key
> | key_len | ref | rows | Extra
> |
> +----+-------------+----------+-------------+---------------+--------------+---------+------+------+---------------------------------------------+
> | 1 | SIMPLE | Packages | index_merge | PRIMARY,Name |
> Name,PRIMARY | 66,4 | NULL | 4 | Using sort_union(Name,PRIMARY);
> Using where |
> +----+-------------+----------+-------------+---------------+--------------+---------+------+------+---------------------------------------------+
> 1 row in set (0.00 sec)
>
More information about the aur-dev
mailing list