NetBSD News Beat
— You heard it here —
|
![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2010-03-04A space-efficient alternative to db and cdb Many applications require a key,value database, but the data itself is (mostly) constant. General purpose hash databases are inefficient for this purpose as they waste a lot of space and often also require more complex queries. The most well-known constant database is from DJB, but it is a number of limitations.The biggest issues of DJB's implementation concern the large overhead (24 Bytes / record) and the inefficient hash function. Bob Jenkins has a comparison. To work around this, I started to design a new constant database format. As a building block I used the CHM algorithm for a minimal perfect hash function. An implementation for that already existed in nbperf(1). I decided to use a separate offset table as it provides a natural encoding of the entry size and makes it easier to allow multiple keys for the same value. To minimize the storage I decided against handling key matching in the library. The pay load for many of the intended uses already contains the key, so storing it separately would waste space. The perfect hashing ensures that only a single entry has to be matched, so the overhead is generally small, too. The result is quite impressive. For the /etc/service database, the cdb version is 304KB, the original db(3) database 2.1MB (4.2MB file size, but sparse). The run time of services_mkdb dropped from 2.2s before to 0.24s afterwards. For terminfo, the cdb version is 1.3MB compared to 2.1MB for db(3) (2.8MB file size, but sparse). The run time of tic -x dropped from 1.5s to 0.4s. Code size is quite small as well, the reader itself is 5.2KB code and the writer 15KB. Compiled for AMD64, the reader is 1.2KB and the writer 3.6KB. 2010-03-012010-02-27The NetBSD runtime linker now has a negative symbol cache. In a nutshell, this has reduced the startup time of the Evolution mail client from around 5 minutes to 3 seconds on my QuadCore amd64 machine. Not many applications have a lot of plugins with a large amount of links to external libraries and I doubt many other applications will gain such a drastic speed bump, but the GNOME desktop as a whole now loads small bit quicker. I would imagine that KDE will now load faster as well. 2010-02-26Hackathon
13 Results 2010-02-25What is left before USE_DESTDIR=yes can be the default for everyone? DESTDIR support is one of the biggest internal
changes since the introduction of package barrier and
it has some user visible consequences.
A new name for a build phase, a new behavior for DESTDIR users and some black magic. When the DESTDIR support was originally introduced
3.5 years ago, I tried to keep it as non-intrusive in
the infrastructure as possible. Fast forward 3 years to the current time. DESTDIR support is now supported by almost all packages. 384 package locations without DESTDIR support remains (compared to 8302 with). To avoid unnecessary regressions from developers not testing it, I made USE_DESTDIR=yes the default for PKG_DEVELOPER a while ago. That's when the complains and insults started. Due to the change, "make install" now
suddenly doesn't modify /usr/pkg any longer.
The other direct consequence is that
"DEPENDS_TARGET=package" breaks as well. First of all, a bit of search and replace. Second, dealing with the surprises. Just renaming install to stage-install -- fails. The install-vars target has be renamed as well, as the necessary files are created. More fun was the addition of the new target. Just adding
.PHONY: install
.if ${_USE_DESTDIR} == "no"
install: stage-install
.else
install: package-install
.endif
...doesn't work. It surprisingly does nothing. What happened? pkgsrc computes a number of variables based on the dependencies. For example, BUILDLINK_PREFIX depends on whether the package is builtin and if not, where it is found. This got quite expensive when pkgviews was introduced. A long time ago, pkgsrc therefore run a new make instance for every major build phase (depends, extract, build, install). This was changed later with the introduction of the pkgsrc barrier. Essentially, the build phases are separated into pre-barrier and post-barrier phases, with a separate make invocation in between. How does this affect the install target above?
install has to be a post-barrier operation otherwise
it ends up effectively invocing build, but not the
desired install rule. pbulk includes logic to avoid rebuilding packages
that haven't changed based on the RCS ID
($NetBSD$). pbulk contains a script to determine whether a package has to be rebuilt or not. There are three checks performed by default:
The third condition is automatically valid after a bulk build, otherwise the system has time keeping or file system issues. The other conditions are more interesting. The first condition can trigger a permanent rebuild if files are formed in a way that the +BUILD_INFO processor extracts incomplete or additional RCS IDs. The rdigest package for example had a patch starting with
@@ -41,8 +41,17 @@ __RCSID("$NetBSD: digest.c,v 1.15 2007/0
As the RCS IDs are extracted with a simple grep expression, this ended up overwriting the original RCS ID at the top. Another example was url2pg, which contained the following statement in a local file:
print PLIST ("\@comment \$NetBSD\$\n");
This is picked up too and should be escape, in this case by splitting the string into two parts in the middle of NetBSD. Interestingly, the second condition is violated in one case as well. p5-DBIx-Class-EncodedColumn depends on p5-Digest-SHA. The version of p5-Digest-SHA is higher than the version of the Perl package, so the dependency resolver picks up that. During the build, pkgsrc decides that perl is already present and good enough, so skips the dependency. I'm not sure how to best address this yet. 2010-02-01Feeds
|
Hosted by Global Wire Oy
|