[pacman-dev] [PATCH 5/5] makepkg: unify source file times for improved build reproducibility

Allan McRae allan at archlinux.org
Tue May 16 00:51:26 UTC 2017


On 13/05/17 01:09, Andrew Gregory wrote:
> On 05/12/17 at 12:41pm, Levente Polyak wrote:
>> Signed-off-by: Levente Polyak <anthraxx at archlinux.org>
>> ---
>>  scripts/makepkg.sh.in | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/scripts/makepkg.sh.in b/scripts/makepkg.sh.in
>> index bd92c526..83c80fa7 100644
>> --- a/scripts/makepkg.sh.in
>> +++ b/scripts/makepkg.sh.in
>> @@ -1731,6 +1731,9 @@ if (( !REPKG )); then
>>  		if (( PREPAREFUNC )); then
>>  			run_prepare
>>  		fi
>> +
>> +		# unify source times before building for reproducibility
>> +		find "$srcdir" -exec touch -h -d "@${SOURCE_DATE_EPOCH}" {} +
>>  	fi
>>  
>>  	if (( PKGVERFUNC )); then
> 
> I'm still not convinced we should be doing this in makepkg and I'm not
> sure exactly where our disagreement about this is at this point.  So,
> let me describe how I would handle reproducible packages and you can
> tell me why your approach is better.
> 
> First, I'm only concerned about manipulating things that makepkg is
> not directly responsible for.  Anything that improves the
> reproducibility of makepkg's own output is fine (e.g. removing the
> timestamp from .PKGINFO and .MTREE files).  
> 
> Beyond that, I don't think makepkg is the place for trying to make
> a package reproducible.  We seem to be in agreement that a separate
> script will be necessary to actually reproduce a built package.
> I would say that the same script should be used to build the original
> package in the first place.  Aside from the fact that changes like
> this break existing usage patterns for makepkg, makepkg is never going
> to be able to guarantee that a package is reproducible.  There are
> simply too many variables that can influence the resulting package for
> makepkg to ever record them all.
> 
> Building a package that is reproducible requires a controlled
> environment.  The script that handles reproducing the package has to
> be able to setup such an environment, so why not let it setup the
> environment for the initial build?  People that don't care about the
> reproducibility of the resulting package can continue to use makepkg
> as they do now and those that do care can use the wrapper script to
> build it.  And, makepkg doesn't have to concern itself with
> reproducibility beyond making sure that its own output is
> reproducible.
> 
> Such a script could handle the timestamp manipulation in this patch.
> We already provide several makepkg options to control which steps are
> run.  The wrapper would invoke makepkg once to extract/prepare the
> sources, the wrapper would then adjust the timestamps itself, finally
> it would invoke makepkg again to continue the build.
> 

>From my understanding, the reproducible build goal is to have a package
build exactly on two invocations of the build tool.  There are some
specific things that need to be set to achieve this (mostly
SOURCE_DATE_EPOCH), but the rest of the environment can be quite
variable.  In fact, the testing framework to ensure reproducibility varies:

date and time,
build path,
hostname,
domain name,
filesystem,
environment variables,
timezone,
language,
locale,
user name,
user id,
group name,
group id,
kernel version,
umask,
CPU type,
number of CPU cores.

So the environment does not necessarily need to be exactly the same
between builds for it to be reproducible.


Given I think python packages are the primary problem here, I'm going to
propose another solution....  Clearly embedding the timestamp in the
pyc/o files is a design decision and not going to be changed.  Could we
however, have a pass in makepkg that generates these files?  In the
"tidy" loop.  That would allow us to set times on the any .py files in
the package, and then generate pyc/o files.   No setting of source times
needed.

Allan


More information about the pacman-dev mailing list