[aur-dev] [PATCH 2/3] Add "mergepkgid" argument to pkg_delete()

Dan McGee dpmcgee at gmail.com
Wed Aug 10 19:08:57 EDT 2011


On Wed, Aug 10, 2011 at 5:48 PM, Lukas Fleischer
<archlinux at cryptocrack.de> wrote:
> On Wed, Aug 10, 2011 at 05:35:46PM -0500, Dan McGee wrote:
>> On Wed, Aug 10, 2011 at 4:51 PM, Lukas Fleischer
>> <archlinux at cryptocrack.de> wrote:
>> > On Tue, Aug 02, 2011 at 04:17:12PM +0200, Lukas Fleischer wrote:
>> >> On Tue, Aug 02, 2011 at 07:45:40AM -0500, Dan McGee wrote:
>> >> > On Tue, Aug 2, 2011 at 7:41 AM, Lukas Fleischer
>> >> > <archlinux at cryptocrack.de> wrote:
>> >> > > On Tue, Aug 02, 2011 at 07:13:05AM -0500, Dan McGee wrote:
>> >> > >> On Sun, Jul 31, 2011 at 12:05 PM, Lukas Fleischer
>> >> > >> <archlinux at cryptocrack.de> wrote:
>> >> > >> > This allows for merging comments and votes of deleted packages into
>> >> > >> > another one which is useful if a package needs to be renamed.
>> >> > >> >
>> >> > >> > Signed-off-by: Lukas Fleischer <archlinux at cryptocrack.de>
>> >> > >> > ---
>> >> > >> >  web/lib/pkgfuncs.inc.php |   24 +++++++++++++++++++++++-
>> >> > >> >  1 files changed, 23 insertions(+), 1 deletions(-)
>> >> > >> >
>> >> > >> > diff --git a/web/lib/pkgfuncs.inc.php b/web/lib/pkgfuncs.inc.php
>> >> > >> > index bb5a592..a81ee01 100644
>> >> > >> > --- a/web/lib/pkgfuncs.inc.php
>> >> > >> > +++ b/web/lib/pkgfuncs.inc.php
>> >> > >> > @@ -659,10 +659,11 @@ function pkg_flag ($atype, $ids, $action = True) {
>> >> > >> >  *
>> >> > >> >  * @param string $atype Account type, output of account_from_sid
>> >> > >> >  * @param array $ids Array of package IDs to delete
>> >> > >> > + * @param int $mergepkgid Package to merge the deleted ones into
>> >> > >> >  *
>> >> > >> >  * @return string Translated error or success message
>> >> > >> >  */
>> >> > >> > -function pkg_delete ($atype, $ids) {
>> >> > >> > +function pkg_delete ($atype, $ids, $mergepkgid) {
>> >> > >> >        if (!$atype) {
>> >> > >> >                return __("You must be logged in before you can delete packages.");
>> >> > >> >        }
>> >> > >> > @@ -678,6 +679,27 @@ function pkg_delete ($atype, $ids) {
>> >> > >> >        }
>> >> > >> >
>> >> > >> >        $dbh = db_connect();
>> >> > >> > +
>> >> > >> > +       if ($mergepkgid) {
>> >> > >> > +               /* Merge comments */
>> >> > >> > +               $q = "UPDATE IGNORE PackageComments ";
>> >> > >> Not a fan of this. This is not in any SQL standard so this would be
>> >> > >> another thing needing adjusted later if someone tried to run using
>> >> > >> non-MySQL. It is also extremely hacky IMO as you can end up
>> >> > >> suppressing real errors, and read this gem from the manual:
>> >> > >>     "Rows for which columns are updated to values that would cause
>> >> > >> data conversion errors are updated to the closest valid values
>> >> > >> instead."
>> >> > >> Which just plain scares me.
>> >> > >>
>> >> > >> I'm not even sure why you are doing IGNORE on the comments merge
>> >> > >> (these can never conflict?), but for the votes merge, simply do a
>> >> > >> * UPDATE where it doesn't already exist.
>> >> > >> * DELETE all remaining.
>> >> > >
>> >> > > Well, I actually can't remember what I was thinking when I wrote the
>> >> > > comments query. I probably just copy-pasted it. That's the best excuse I
>> >> > > can come up with, yeah :)
>> >> > >
>> >> > > I'll fix this and think of a better filter criteria for the votes merge
>> >> > > query. I don't think we need the delete all remaining (duplicate) votes
>> >> > > though, as we do remove the associated packages anyway and "ON DELETE"
>> >> > > should do the rest.
>> >> >
>> >> > Ahh, true on the auto-cascade DELETE thoughts. If you just add a
>> >> > little more logic to the WHERE clause you should be able to make the
>> >> > UPDATES selective enough to avoid the duplicates issue.
>> >>
>> >> Yeah. Well, the issue here is that there might also be duplicates within
>> >> the deleted packages (e.g. if a user voted on two of the packages that
>> >> we want to merge but didn't vote on the target package itself), so for
>> >> each of the affected votes we would have to:
>> >>
>> >> 1) Check if the user already voted on the target package. If so, discard
>> >>    the vote. An alternative approach is to exclude all votes that
>> >>    originate from a user who has voted on the target package as well
>> >>    right from the start.
>> >>
>> >> 2) Check if there's another vote from the same user on any of the other
>> >>    packages in the remove queue. If so, only move one of them to the new
>> >>    package. Maybe we could also group by user and remove such duplicates
>> >>    beforehand.
>> >>
>> >> We also have the limitation that MySQL doesn't allow to UPDATE a table
>> >> and SELECT from the same table in a subquery. So we will either have do
>> >> use deep nested subqueries, a self join or a temporary table here.
>> >>
>> >> Another idea that just came into my mind is that we could also extract
>> >> all potentially affected votes, group them by user, delete all records
>> >> but one from each group and finally update the remaining record to match
>> >> the target package's ID. Not sure if that can be implemented as a single
>> >> SQL query.
>> >>
>> >> I'll think about how to implement that in detail when ever I get around
>> >> to doing it...
>> >
>> > I tried implementing that in a single query and ended up with a bunch of
>> > nested subqueries which were rather confusing. Maybe we should really
>> > consider using a temporary table here - particularly since this will be
>> > a function that will be rarely used and performance shouldn't be that
>> > significant here (as opposed to maintainability).
>> >
>> > If anyone can come up with a simple query that covers all aspects,
>> > suggestions welcome!
>>
>> -- original query
>> UPDATE PackageVotes
>> SET PackageID = ?
>> WHERE PackageID IN (?, ?, ?);
>>
>> -- new queries
>> -- only update votes that wouldn't exist after modification
>> UPDATE PackageVotes
>> SET PackageID = ?merge
>> WHERE PackageID IN (?others)
>> AND UsersID NOT IN (
>>         -- this silly double SELECT works around shitty MySQL not
>> letting you reference the updated table in the update, as it forces
>> materialization of the subquery
>>       SELECT * FROM (
>>               SELECT UsersID
>>               FROM PackageVotes
>>               WHERE PackageID = ?merge
>>       ) temp
>> );
>
> That's what I had when I wrote the query for the very first time but
> that won't work if we merge more than one package at once. Read my
> next-to-last email in this thread for some more details.

Oh damn it, I thought I set up my test data right- I'll see what I can tweak.

>>
>> -- all remaining un-updated votes are dupes, delete them
>> DELETE FROM PackageVotes
>> WHERE PackageID IN (?, ?, ?);
>
> No need to do this. ON DELETE CASCADE, remember? :)
I know, but explicit >> implicit, especially once you wrap it in a
transaction, I don't know.

-Dan


More information about the aur-dev mailing list