noarching source packages

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

noarching source packages

Jon TURNEY

Picking up the discussion from [1], I've been looking a bit at noarching
the source packages.

So, the first problem is that we don't really have source packages.

Instead there is a special package (conventionally, the main one) which
has a source archive as well as a binary archive, and all the
subpackages point to that package with an external-source: hint, which
becomes a source: line pointing to that source archive in setup.ini.
(i.e. in that case, the setup.ini contains multiple, identical source:
lines for each subpackage)

* setup

Surprisingly, there is support in setup for source packages (See Source:
vs. source: in [2]), and this seems to work.

Note that the name of the source package must be distinct from the
binary package, so this probably implies some naming convention for
source packages (e.g. the source package for foo is foo-src)

(These foo-src packages currently will be shown in the list of packages.
  This differs from current packages which are source-only, as these
aren't listed in setup.ini at all, and thus never shown in the list of
packages [this is was what skip: used to indicate, but these cases are
automatically detected, these days])

So, perhaps a minor improvement to setup to remove all packages which
are source-only from the displayed package list is needed, or to add a
separate filter view which only shows source packages?

(Note that source archives are already treated specially in other ways,
e.g. the files are just dumped in /usr/src/ and the package is not
recorded as being installed)

* calm

calm would need updating to look for packages in src/ as well as noarch/
and <arch>/, and to emit 'Source:' rather than 'source:' lines in
setup.ini when the source is an actual source package.

* cygport

It's not quite clear how to deal with making source packages.  If we do
it when we make the binary package (as now), then there is the near
certainly that the source package made for a different arch will differ,
gratuitously.

(This will always be the case if gpg signing is turned on, as the .sig
inside the source archive is timestamped.  It will also be the case if
timestamps or filesystem file order are leaked into the archive)

This will lead to a rejected upload (as uploading the same package with
different contents is not allowed by calm)

It's possible to make a separate step to make just the source package,
but perhaps this makes more work, as the maintainer will need to
explicitly do that (once), otherwise the upload will be rejected due to
not having a source.

This also potentially loses information, as the maintainer might adjust
the .cygport to build on the 2nd architecture they try, but those
changes wouldn't be uploaded, (whereas currently the source actually
used for the build is uploaded)

The source package will now always require a separate .hint, so we need
a means to manually provide a .hint file for that.

Uploading needs to place the source package in the appropriate place.

* benefit

Applied retroactively, it looks like this would save about 13G (out of a
total mirror size of approximately 97G), but it seems that there are
many source packages which (usually spuriously) differ between arches,
so that saving wouldn't be immediately realized.

> sware[...ftp/pub/cygwin]$ find x86 -name \*-src\* -print0 | du --files0-from=- -hc | tail -n1
> 13G     total
> sware[...ftp/pub/cygwin]$ find x86_64 -name \*-src\* -print0 | du --files0-from=- -hc | tail -n1
> 13G     total

[1] https://cygwin.com/ml/cygwin-apps/2016-04/msg00039.html
[2] https://sourceware.org/cygwin-apps/setup.ini.html
Reply | Threaded
Open this post in threaded view
|

Re: noarching source packages

Achim Gratz
Jon Turney writes:
> Picking up the discussion from [1], I've been looking a bit at
> noarching the source packages.
>
> So, the first problem is that we don't really have source packages.

I'll use this occasion to raise the topic of the debuginfo packages
again.  I still think we should change their naming convention (or
alternatively the naming convention for the source packages) and a large
part of them is made up again of the source files, which should be
separated out into noarch also.

> calm would need updating to look for packages in src/ as well as
> noarch/ and <arch>/, and to emit 'Source:' rather than 'source:' lines
> in setup.ini when the source is an actual source package.

I'd be hesitant to use yet another tree for this.  We already have way
too many directories that make up the repo.

> It's not quite clear how to deal with making source packages.  If we
> do it when we make the binary package (as now), then there is the near
> certainly that the source package made for a different arch will
> differ, gratuitously.

The only sane way is to mandate that the packages for all arches are
built together so that you can package the sources only once during the
packaging step.  Otherwise you either have to check that the contents
(ignoring the metadata that _will_ differ) is identical between the
source archives you've built seperately and then chose one of those for
upload or you'll have to force a reproducible build of the source
archive at least.

> This also potentially loses information, as the maintainer might
> adjust the .cygport to build on the 2nd architecture they try, but
> those changes wouldn't be uploaded, (whereas currently the source
> actually used for the build is uploaded)

It's easy enough to branch that decision inside the cygport file and the
only time I did that have passed now that the package content in both
arches is almost identical.  So is anybody really doing that currently?

But the real problem is that besides our own stuff some upstream sources
are archful.

> Applied retroactively, it looks like this would save about 13G (out of
> a total mirror size of approximately 97G), but it seems that there are
> many source packages which (usually spuriously) differ between arches,
> so that saving wouldn't be immediately realized.

From my last dedup exercise (where my local Cygwin repo was around 80GB
since I don't mirror some of the cross-compilation and KDE packages)
doing the dedup on just the source and doc packages reduced the size of
the repo by 30GB.  I'll note again that if it was possible to split off
the noarch part of _all_ packages the gains would be larger than that.
The way it would work is that setup.exe should accept both noarch and
arch archives for the same package.  It would then proceed to first
install the noarch and then the arch part if it finds both of them.
Incidentally, this would keep the current tree structure intact and
allow us to freely move packages from arch to noarch and vice versa
between different releases with no manual intervention.


Regards,
Achim.
--
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

Factory and User Sound Singles for Waldorf Q+, Q and microQ:
http://Synth.Stromeko.net/Downloads.html#WaldorfSounds
Reply | Threaded
Open this post in threaded view
|

Re: noarching source packages

Jon TURNEY
On 27/04/2017 18:53, Achim Gratz wrote:
> Jon Turney writes:
>> Picking up the discussion from [1], I've been looking a bit at
>> noarching the source packages.
>>
>> So, the first problem is that we don't really have source packages.
>
> I'll use this occasion to raise the topic of the debuginfo packages
> again.  I still think we should change their naming convention (or
> alternatively the naming convention for the source packages) and a large

What is your reason for changing the name?

I was wondering if we need to explicitly identify debuginfo archives as
a different kind of thing.  Currently, debuginfo packages work just like
any other install archive, which is fine, except for perhaps they need a
separate filter in setup.

> part of them is made up again of the source files, which should be
> separated out into noarch also.

Nice idea, but the practicalities seem complex (e.g. generated source
files needs to be treated correctly). In any case, this would seem to be
a piece of work which falls after noarching the sources.

>> calm would need updating to look for packages in src/ as well as
>> noarch/ and <arch>/, and to emit 'Source:' rather than 'source:' lines
>> in setup.ini when the source is an actual source package.
>
> I'd be hesitant to use yet another tree for this.  We already have way
> too many directories that make up the repo.

'too many'? why?

>> It's not quite clear how to deal with making source packages.  If we
>> do it when we make the binary package (as now), then there is the near
>> certainly that the source package made for a different arch will
>> differ, gratuitously.
>
> The only sane way is to mandate that the packages for all arches are
> built together so that you can package the sources only once during the
> packaging step.  Otherwise you either have to check that the contents

That would seem to require a cross-compilation environment for at least
one cygwin arch, with all the dependencies available.

> (ignoring the metadata that _will_ differ) is identical between the
> source archives you've built seperately and then chose one of those for
> upload or you'll have to force a reproducible build of the source
> archive at least.
>
>> This also potentially loses information, as the maintainer might
>> adjust the .cygport to build on the 2nd architecture they try, but
>> those changes wouldn't be uploaded, (whereas currently the source
>> actually used for the build is uploaded)
>
> It's easy enough to branch that decision inside the cygport file and the
> only time I did that have passed now that the package content in both
> arches is almost identical.  So is anybody really doing that currently?

At the moment, nothing prevents SRC_URI and PATCH_URI depending on the
ARCH, so we just don't know.

But this is more a question of workflow: nothing stops the maintainer
going back and changing the source package, then just rebuilding one
architecture.

The ideal solution would be a build service which accepts a source
package and produces the install archives, but I don't see that
happening anytime soon...

> But the real problem is that besides our own stuff some upstream sources
> are archful.

Examples?

>> Applied retroactively, it looks like this would save about 13G (out of
>> a total mirror size of approximately 97G), but it seems that there are
>> many source packages which (usually spuriously) differ between arches,
>> so that saving wouldn't be immediately realized.
>
> From my last dedup exercise (where my local Cygwin repo was around 80GB
> since I don't mirror some of the cross-compilation and KDE packages)
> doing the dedup on just the source and doc packages reduced the size of
> the repo by 30GB.  I'll note again that if it was possible to split off
> the noarch part of _all_ packages the gains would be larger than that.
> The way it would work is that setup.exe should accept both noarch and
> arch archives for the same package.  It would then proceed to first
> install the noarch and then the arch part if it finds both of them.
> Incidentally, this would keep the current tree structure intact and
> allow us to freely move packages from arch to noarch and vice versa
> between different releases with no manual intervention.

Great.  I look forward to reading the patches :)

Reply | Threaded
Open this post in threaded view
|

Re: noarching source packages

Achim Gratz
Jon Turney writes:
> What is your reason for changing the name?

There shouldn't be two different naming conventions for the same
purpose.  So

package-version-release[-purpose].tar.xz

with purpose:=[source|debuginfo] would be preferrable.

> I was wondering if we need to explicitly identify debuginfo archives
> as a different kind of thing.  Currently, debuginfo packages work just
> like any other install archive, which is fine, except for perhaps they
> need a separate filter in setup.

They wouldn't with the above naming convention and you'd just tick
another box to say you want them installed, just like sources.  We might
even skip the archful directories and just do [noarch|x86|x86_64] as
well in the same place.

>> part of them is made up again of the source files, which should be
>> separated out into noarch also.
>
> Nice idea, but the practicalities seem complex (e.g. generated source
> files needs to be treated correctly). In any case, this would seem to
> be a piece of work which falls after noarching the sources.

Agreed.

>> I'd be hesitant to use yet another tree for this.  We already have way
>> too many directories that make up the repo.
>
> 'too many'? why?

I currently have to pull the mirror through a HTTP proxy, and most of
the time is spent in traversing directories.  Yes, it'd be possible to
determine which packages are missing and directly pull those, but I
haven't got around to scripting that yet.

>> The only sane way is to mandate that the packages for all arches are
>> built together so that you can package the sources only once during the
>> packaging step.  Otherwise you either have to check that the contents
>
> That would seem to require a cross-compilation environment for at
> least one cygwin arch, with all the dependencies available.

Not necessarily.  You just need to package both with the same step.  But
yes, cygport makes this perhaps a bit harder than it should.

>> It's easy enough to branch that decision inside the cygport file and the
>> only time I did that have passed now that the package content in both
>> arches is almost identical.  So is anybody really doing that currently?
>
> At the moment, nothing prevents SRC_URI and PATCH_URI depending on the
> ARCH, so we just don't know.
>
> But this is more a question of workflow: nothing stops the maintainer
> going back and changing the source package, then just rebuilding one
> architecture.

So just define some variable in *.cygport that says "I'm not doing any
of that nonsense and want to build for two arches".  Unless it's set,
everything stays as it is today.

> The ideal solution would be a build service which accepts a source
> package and produces the install archives, but I don't see that
> happening anytime soon...

Me neither.

>> But the real problem is that besides our own stuff some upstream sources
>> are archful.
>
> Examples?

Last I looked, it was texlive.  No idea why.

>> The way it would work is that setup.exe should accept both noarch and
>> arch archives for the same package.  It would then proceed to first
>> install the noarch and then the arch part if it finds both of them.
>> Incidentally, this would keep the current tree structure intact and
>> allow us to freely move packages from arch to noarch and vice versa
>> between different releases with no manual intervention.
>
> Great.  I look forward to reading the patches :)

You're talking about setup.exe, calm or both?  I'm tied up at work
for the foreseeable future, so I can't spend many cycles on it.


Regards,
Achim.
--
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

SD adaptation for Waldorf rackAttack V1.04R1:
http://Synth.Stromeko.net/Downloads.html#WaldorfSDada
Reply | Threaded
Open this post in threaded view
|

Re: noarching source packages

Ken Brown-6
On 5/1/2017 3:57 PM, Achim Gratz wrote:
>>> But the real problem is that besides our own stuff some upstream sources
>>> are archful.
>>
>> Examples?
>
> Last I looked, it was texlive.

This might go back to the time when biber was distributed as a packed
perl archive on x86 but not x86_64.  But it hasn't been the case for a
while.

Ken
Reply | Threaded
Open this post in threaded view
|

Re: noarching source packages

Ken Brown-6
On 5/1/2017 4:05 PM, Ken Brown wrote:

> On 5/1/2017 3:57 PM, Achim Gratz wrote:
>>>> But the real problem is that besides our own stuff some upstream sources
>>>> are archful.
>>>
>>> Examples?
>>
>> Last I looked, it was texlive.
>
> This might go back to the time when biber was distributed as a packed perl
> archive on x86 but not x86_64.

No, it was actually due to the existence of source files of the form

     <pkg>.<cpu>-cygwin.tar.xz.

But it was fixed a year ago.  See the discussion at

     https://sourceware.org/ml/cygwin-apps/2016-05/msg00049.html

and cygport commit 5c559d5ea49d69116d3073b68c8fb1e70522370a.

Ken
Reply | Threaded
Open this post in threaded view
|

Re: noarching source packages

Jon TURNEY
On 03/05/2017 12:50, Ken Brown wrote:

> On 5/1/2017 4:05 PM, Ken Brown wrote:
>> On 5/1/2017 3:57 PM, Achim Gratz wrote:
>>>>> But the real problem is that besides our own stuff some upstream
>>>>> sources
>>>>> are archful.
>>>>
>>>> Examples?
>>>
>>> Last I looked, it was texlive.
>>
>> This might go back to the time when biber was distributed as a packed
>> perl archive on x86 but not x86_64.
>
> No, it was actually due to the existence of source files of the form
>
>     <pkg>.<cpu>-cygwin.tar.xz.
>
> But it was fixed a year ago.  See the discussion at
>
>     https://sourceware.org/ml/cygwin-apps/2016-05/msg00049.html
>
> and cygport commit 5c559d5ea49d69116d3073b68c8fb1e70522370a.

Interesting.

Anyhow, it seems that any cases of this we know of are bugs or mistakes.

We can always adopt the solution here, where the source package contains
sources for both arches (ofc, if those are fundamentally different,
there's a question as to in what sense they are the "same" package
anyhow... :-))
Reply | Threaded
Open this post in threaded view
|

Re: noarching source packages

Jon TURNEY
In reply to this post by Achim Gratz
On 01/05/2017 20:57, Achim Gratz wrote:
> Jon Turney writes:
>> What is your reason for changing the name?
>
> There shouldn't be two different naming conventions for the same
> purpose.  So
>
> package-version-release[-purpose].tar.xz
>
> with purpose:=[source|debuginfo] would be preferrable.

If we were starting from scratch, maybe.

The assumption that the "package" part is unique for installable
packages is rather deeply entrenched, and I don't actually see any
benefit apart from the aesthetic in changing this now.

If we're going for a foolish consistency, naming things as
package-version[-purpose]-release would be probably easier to implement :-)

>> I was wondering if we need to explicitly identify debuginfo archives
>> as a different kind of thing.  Currently, debuginfo packages work just
>> like any other install archive, which is fine, except for perhaps they
>> need a separate filter in setup.
>
> They wouldn't with the above naming convention and you'd just tick
> another box to say you want them installed, just like sources.  We might
> even skip the archful directories and just do [noarch|x86|x86_64] as
> well in the same place.

I think it would be much better to have the associated debuginfo for a
package described in setup.ini, rather than mapping package name ->
source package name -> debuginfo package name, as you seem to be suggesting.

>>> part of them is made up again of the source files, which should be
>>> separated out into noarch also.
>>
>> Nice idea, but the practicalities seem complex (e.g. generated source
>> files needs to be treated correctly). In any case, this would seem to
>> be a piece of work which falls after noarching the sources.
>
> Agreed.
>
>>> I'd be hesitant to use yet another tree for this.  We already have way
>>> too many directories that make up the repo.
>>
>> 'too many'? why?
>
> I currently have to pull the mirror through a HTTP proxy, and most of
> the time is spent in traversing directories.  Yes, it'd be possible to
> determine which packages are missing and directly pull those, but I
> haven't got around to scripting that yet.

Ah, "too many" in some specific and limited sense. :-)

I'm not sure how many people are the situation of "I want to maintain a
mirror, but can't use rsync".

It seems a reasonable intuition that a more compact directory tree would
be somewhat more efficient, but that is basically saying that the
connection setup time for transferring index.html dominates.

Have you tried a HTTP mirroring tool which can parallelize it's requests
(assuming such a thing exists, I think axel can do that)?



Reply | Threaded
Open this post in threaded view
|

Re: noarching source packages

Achim Gratz
Jon Turney writes:
> The assumption that the "package" part is unique for installable
> packages is rather deeply entrenched, and I don't actually see any
> benefit apart from the aesthetic in changing this now.

Well, it's not really the aesthetics: the debuginfo package is, like the
source package something auxiliary, so it still irks me that it is
treated as a first class (sub-)package.

> If we're going for a foolish consistency, naming things as
> package-version[-purpose]-release would be probably easier to
> implement :-)

*shrugs*  That boat has sailed exactly how long ago?

> I think it would be much better to have the associated debuginfo for a
> package described in setup.ini, rather than mapping package name ->
> source package name -> debuginfo package name, as you seem to be suggesting.

I'd settle for consistent treatment with the other auxiliary package,
namely sources.

> I'm not sure how many people are the situation of "I want to maintain
> a mirror, but can't use rsync".

I'm quite certain that practically all corporate networks are walled off
one way or the other.  In fact I've gone through the exercise of
requesting rsync access to the outside, only to be denied.

> It seems a reasonable intuition that a more compact directory tree
> would be somewhat more efficient, but that is basically saying that
> the connection setup time for transferring index.html dominates.

It does, and the proxy I need to go through decidedly slows down the
access even further.  I know since I can do the same thing from home
considerably faster.

> Have you tried a HTTP mirroring tool which can parallelize it's
> requests (assuming such a thing exists, I think axel can do that)?

I already have a quite elaborate lftp script for doing that that does
several things in parallel.


Regards,
Achim.
--
+<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+

SD adaptation for Waldorf microQ V2.22R2:
http://Synth.Stromeko.net/Downloads.html#WaldorfSDada
Reply | Threaded
Open this post in threaded view
|

Re: noarching source packages

Brian Inglis
In reply to this post by Achim Gratz
On 2017-05-01 13:57, Achim Gratz wrote:

> Jon Turney writes:
>> What is your reason for changing the name?
> There shouldn't be two different naming conventions for the same
> purpose. So
> package-version-release[-purpose].tar.xz
> with purpose:=[source|debuginfo] would be preferrable.
>> I was wondering if we need to explicitly identify debuginfo
>> archives as a different kind of thing. Currently, debuginfo
>> packages work just like any other install archive, which is fine,
>> except for perhaps they need a separate filter in setup.
> They wouldn't with the above naming convention and you'd just tick
> another box to say you want them installed, just like sources. We
> might even skip the archful directories and just do
> [noarch|x86|x86_64] as well in the same place.

In the same vein, boxes for doc and devel packages, where available,
would make selection easier for users, developers, and maintainers,
but require changes to cygport and setup to integrate handling.

--
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada
Reply | Threaded
Open this post in threaded view
|

Re: noarching source packages

marco atzeri-4
On 06/05/2017 02:20, Brian Inglis wrote:

> On 2017-05-01 13:57, Achim Gratz wrote:
>> Jon Turney writes:
>>> What is your reason for changing the name?
>> There shouldn't be two different naming conventions for the same
>> purpose. So
>> package-version-release[-purpose].tar.xz
>> with purpose:=[source|debuginfo] would be preferrable.
>>> I was wondering if we need to explicitly identify debuginfo
>>> archives as a different kind of thing. Currently, debuginfo
>>> packages work just like any other install archive, which is fine,
>>> except for perhaps they need a separate filter in setup.
>> They wouldn't with the above naming convention and you'd just tick
>> another box to say you want them installed, just like sources. We
>> might even skip the archful directories and just do
>> [noarch|x86|x86_64] as well in the same place.

Tick the box is already one of the least understood feature of setup,
I suggest to not add further stuff there.

debuginfo and src have a major difference in installation:

debuginfo are tracked as normal packages, reported on "/etc/setup/"
and they can be unistalled as any other package.

src package are not tracked at all and can not be unistalled
with setup. If we decide to manage them like the other
packages we should remove such anomaly.

> In the same vein, boxes for doc and devel packages, where available,
> would make selection easier for users, developers, and maintainers,
> but require changes to cygport and setup to integrate handling.

this is in IMHO a useless complication, the search filter is already
available for selecting similar packages.

We have multiple documentation or multiple development packages
for a single source package so clustering is not obvious at all

Regards
Marco



Reply | Threaded
Open this post in threaded view
|

Re: noarching source packages

Brian Inglis
On 2017-05-06 08:38, Marco Atzeri wrote:

> On 06/05/2017 02:20, Brian Inglis wrote:
>> On 2017-05-01 13:57, Achim Gratz wrote:
>>> Jon Turney writes:
>>>> What is your reason for changing the name?
>>> There shouldn't be two different naming conventions for the same
>>> purpose. So
>>> package-version-release[-purpose].tar.xz
>>> with purpose:=[source|debuginfo] would be preferrable.
>>>> I was wondering if we need to explicitly identify debuginfo
>>>> archives as a different kind of thing. Currently, debuginfo
>>>> packages work just like any other install archive, which is fine,
>>>> except for perhaps they need a separate filter in setup.
>>> They wouldn't with the above naming convention and you'd just tick
>>> another box to say you want them installed, just like sources. We
>>> might even skip the archful directories and just do
>>> [noarch|x86|x86_64] as well in the same place.
> Tick the box is already one of the least understood feature of setup,
> I suggest to not add further stuff there.
> debuginfo and src have a major difference in installation:
> debuginfo are tracked as normal packages, reported on "/etc/setup/"
> and they can be unistalled as any other package.
> src package are not tracked at all and can not be unistalled with
> setup. If we decide to manage them like the other packages we should
> remove such anomaly.

The automatic/manual pick bit in installed.db could be extended to
also add flag bits for installed -src, -doc, -devel, and -debuginfo
pkgs, and it would also be useful to add flag bits for base pkgs and
a sticky keep/hold bit to support that long requested feature.
I've been looking at it from the perspective of adding apt-cyg
features available in Debian apt{,-*}, without keeping info separate
from setup which could become inconsistent.

>> In the same vein, boxes for doc and devel packages, where
>> available, would make selection easier for users, developers, and
>> maintainers, but require changes to cygport and setup to integrate
>> handling.
> this is in IMHO a useless complication, the search filter is already
> available for selecting similar packages.
> We have multiple documentation or multiple development packages for a
> single source package so clustering is not obvious at all

For each binary arch install is there more than one each related
-devel, -debuginfo, -doc, or -src pkg?

--
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada