Symbol versioning extensions to ELF We are starting with Sun's system, as implemented by them. It is more fully described below, with references to Sun's documentation. Here is a normal Sun style of config file: VERS_1.1 { global: foo1; local: bar2; *_tmp; }; VERS_1.2 { foo2; } VERS_1.1; VERS_2.0 { } VERS_1.2; Here 3 versions are defined, VERS_1.1, VERS_1.2, and VERS_2.0. There are dependencies specified so that VERS_1.2 depends upon VERS_1.1, and VERS_2.0 depends upon VERS_1.2. Note that there is an implied base version prior to VERS_1.1 which holds all symbols not specified in the mapfile. Finally some of the symbols in the library are bound to specific levels of interface, and it is possible via the local directive to prevent some functions from being exported from the shared library. There are two problems with Sun's approach. The first is that multiple incompatible versions of the same function cannot be present in the same library, and the second is that the library maintainer needs to edit some master config file when an interface changes. We would like to be able to specify the version of the function in the same source file as the definition. Let us say we have 3 different implementations of the function foob(). The versions themselves could be specified in the following manner in the source file where they are defined: original_foo() { return 1+bar(); } old_foo() { return 2+bar(); } old_foo1() { return 3+bar(); } new_foo() { return 4+bar(); } __asm__(".symver original_foo,foo@"); __asm__(".symver old_foo,foo@VERS_1.1"); __asm__(".symver old_foo1,foo@VERS_1.2"); __asm__(".symver new_foo,foo@@VERS_2.0"); The function names original_foo, old_foo, old_foo1, and new_foo are names of convenience that are used to internally reference the functions. They don't need to be exported from the shared library itself (you can use the local: directive in the mapfile and exclude anything like "old_*", for example). Currently they cannot be static functions - they must have the same visibility as the function you are trying to export (i.e. global, weak). The four assembly pseudo-ops at the end are the ones that bind the version labels to the different implementations. The first one, "foo@" is bound to the implied base version of the function (this would normally be the first version of the function which a given library had). There are 3 other implemenations that are bound to the other version nodes. Note that the VERS_2.0 version is bound with the foo@@VERS_2.0 (it uses a @@ rather than an @). This serves to signify that the symbol is the default version of the function to which external references are bound if they don't explicitly specify a version number. In general, the version specification tags won't appear in any header files, and will only go in the library sources themselves, so external references will tend to be just for 'foo'. If you are absolutely sure that you need to bind to a non-default interface, then you will have to fully specify the version you want in the place where you reference it. An example of how you do this is not yet available. There is no requirement that a 'default' version of a given interface actually exist. If you are depreciating a given function and wish to remove it from the library so that no new applications can bind against it, then it would actually make sense not to have a default version. Old applications would still be able to bind to the specified versions of the functions at runtime and do the right thing, so it would only be new applications that would be affected by this. Sun does have this concept of a 'weak' version, which is a version node that has no symbols bound into it. The idea is that if you make a bugfix release of some kind, then there are no new functions, and no old functions being removed. There is apparently no functional reason for the weak version to exist, but the purpose would seem to be that it allows you the same sort of control that library minor number versioning would provide. An application linked against a versioned library will have a list of the required interfaces. When the application is linked against the shared library, the version of all of the symbols imported from the shared library is noted, and a list of the required interfaces is generated. At runtime, the dynamic loader should walk the list of required interfaces along with the list of defined interfaces in each of the dynamic objects that are loaded, and the dynamic loader will report an error if a required interface is missing. In effect, this feature is a much more sophisitcated type of minor version number handling. Note that weak version nodes don't have any symbols explicitly bound to them, so that if you run an application with an older shared library than the application was linked against, there will be no error as long as all of the required interfaces are present. Let us say that a new version of a library is released, with a new version node VERS_4.0, and this node contains the symbol xyzzy(). Let us also say you have an application that does not require xyzzy() - this means there will never be a dependence upon VERS_4.0 in the application no matter what version of the library it is linked against. Now let us say that you have a second application that does require xyzzy(). When you link it against the new library, it will have a version dependence upon VERS_4.0, and the dynamic linker should refuse to run the application if you are using an older library. In this way we guarantee that there are no dynamic linker time bombs present. -------------------------------------------------------------------------------- Library maintainer's job. The new job of the library maintainer is mainly to maintain the mapfile. Everytime there is a new release, a new version node should be added - this in effect replaces the concept of 'minor number/patch level' that we used to have in the old a.out days. Creating incompatible interfaces should be an *extremely* rare occurence. If you need to do it frequently, you are probably doing something wrong, or misusing this feature. In those rare cases where you do need to create an incompatible interface, you need to do a couple of things: 1) Create the entry in the mapfile (which you would have to do anyways). 2) Locate the source for the old default interface, and change it so that it is no longer the 'default'. This is so that new applications will not be able to link against it. 3) Add the new interface and use the '@@' in the .symver directive to mark it as the new default. -------------------------------------------------------------------------------- Specifying the version script. ------------------------------ There are two ways to specify a version script to the linker. The first is to use something like: $ ld --shared ... --version-script foo.map where foo.map has a format like: >VERS_1.1 { > foo1; >}; It is also possible to specify the mapfile directly as an input file to the linker: $ ld --shared ... foo2.map >VERSION { > VERS_1.1 { > foo1; > }; >} -------------------------------------------------------------------------------- Diagnosing problems - verifying correct usage. There are a couple of ways to determine whether everything is working correctly. The main diagnostic tool is objdump. Here is an example of what it will do: >bash$ ../binutils/objdump --dynamic-syms test2.so > >test2.so: file format elf32-i386 > >DYNAMIC SYMBOL TABLE: >000012c0 g DO *ABS* 00000000 _DYNAMIC >000012b0 g DO *ABS* 00000000 _GLOBAL_OFFSET_TABLE_ >000002a0 g DF .text 0000000c main@@GNU_1.1 >00000000 l D *UND* 00000000 >000002ac g DO *ABS* 00000000 _etext >00001360 g DO *ABS* 00000000 _edata >00001360 g DO *ABS* 00000000 __bss_start >00001360 g DO *ABS* 00000000 _end >00000000 DF *UND* 0000001d foo@SUNW_1.3a >00000000 g DO *ABS* 00000000 GNU_1.1 Note that the symbols are shown with the version numbers associated. In this example, the implementation of main() is shown to be part of the GNU_1.1 version. There is an import of foo(), and we require that foo() be the one associated with the SUNW_1.3a version. >bash$ ../binutils/objdump --dynamic-reloc test2.so > >test2.so: file format elf32-i386 > >DYNAMIC RELOCATION RECORDS >OFFSET TYPE VALUE >000012bc R_386_JUMP_SLOT foo@SUNW_1.3a Here we see a dynamic relocation, and again the full specification of the symbol is shown. Finally there is one other bit that can be displayed with objdump: >bash$ ../binutils/objdump --private-header test2.so > >test2.so: file format elf32-i386 > >[...] > >Version definitions: >1 0x01 0x0ca7523f test2.so > >2 0x00 0x0c3b2451 GNU_1.1 > > >Version references: > Interfaces required from ./test.so: > 0x03d27931 0x00 03 SUNW_1.3a This displays the version definitions and the version requirements for test2.so. It shows the base definition (#1) with the SONAME specified. It shows one additional definition, GNU_1.1, which is internally marked as #2. It also shows one version reference, which is shown as #3 if you look in the 3rd field just prior to the SUNW_1.3a. Note: This is just a first attempt at providing versioning information through objdump. In the long run, something more sophisticated would probably be a good idea - look at the usage of pvs under Solaris 2.5 for an example of the possibilities. -------------------------------------------------------------------------------- Full description of Sun's versioning scheme. Sun has an approach to symbol versioning which became available with Solaris 2.5. It isn't well documented, but if you look long and hard you can find it. You need to start with the disc labeled "2.5 Software Developer Kit". The one we have is 704-4927-10, Rev A, Nov, 1995. Once you have this mounted, then look for the directory: /sdk_2_5/common/SUNWabsdk/reloc/$PostScriptDEST/LLM Look for the file "05.Versioning" - this is a postscript file that describes their scheme in a good bit of detail. For those of you who have access to this, I strongly suggest you print this out. Their description is a lot more thorough than mine, and it is probably presented in a much clearer way. Unfortunately this document is copyrighted by Sun, and until we find a good source of the documents that is freely redistributable, I cannot make available copies of it. ------------- To begin with, Sun's approach centers around a mapfile which is passed to the linker when you build a shared library. In this file, you have the ability to declare what version the library is, what version each symbol is, what interfaces are defined, etc, etc. When you link against a shared library containing versioning information, the linker makes note of which interfaces are required by the functions that the application is calling from the library, and writes a description of the required interfaces into the executable. At startup time, the dynamic loader will compare what the application needs against what the shared library provides, and complain about any discrepancy. The first thing this scheme provides is effective minor number version control. The idea is that if a new library adds a new function, it will be in a new version of an interface, and if the application needs this function, it will be listed in the requirements for the executable. This is probably not at all clear right now, so some examples are in order. I will show you some mapfiles, and explain what is possible and how you use them. I am only showing how these are used in the Sun scheme. The extensions that we are adding will be described in more detail later on. For a brand new shared library, a mapfile could look something like this: VERS_1.0 { global: foo1, foo2; local: *; }; -------------------------------------------------------------------------------- Error conditions: + If a decoration appears which doesn't correspond to an actual tree node, this is a fatal error condition. + Since the symbols like "foob$VERS_1.1" are meant for internal use only it should be an error if the name "foob$VERS_1.1" appears in the `global' part + it should not be an error to mention the "foob$VERS_1.1" symbol in the `local' part of the VERS_1.1 section. + but it should be an error to mention it in a section other than VERS_1.1. + if a symbol is mentioned to be in a section (version) but no decorated name exist this is an error + it should be possible to mention a symbol in more than one version section but in this case there must be an name in a .o file which exactly matches this section. As a special case it is not allowed that a name is mentioned in more than one section but is not versioned. + mentioning any versioned symbol in .o files is optional. They will by default be placed in the appropriate `local' part. ________________________________________________________________________________ Internals of Sun's implementation of versioning. Sun accomplished versioning by adding 3 sections to shared libraries and executables linked against shared libraries. Not all sections are required to be present, but it will become evident which ones are required in each case. Here is a summary of the particulars of the sections. The contents of the sections will be discussed later on. All 3 sections are named ".SUNW_version", and all three sections are placed in read-only memory. sh_type type value sh_entsize sh_link sh_info ------ ---------- ---------- ------- ------- SUNW_verneed 0x6ffffffe 0 .dynstr #need SUNW_verdef 0x6ffffffd 0 .dynstr #def SUNW_versym 0x6fffffff 2 .dynsym 0 In addition, up to 4 new entries are added to the .dynamic section. These are: #define DT_VERDEF 0x6ffffffc /* Points to SUNW_verdef */ #define DT_VERDEFNUM 0x6ffffffd /* #def */ #define DT_VERNEED 0x6ffffffe /* Points to SUNW_verneed */ #define DT_VERNEEDNUM 0x6fffffff /* #need */ Now I will describe the contents of the individual sections. ----------- SUNW_versym: This section is an array of short ints that runs in parallel to .dynsym. In Sun's implementation, the values stored here are only non-zero for symbols that are defined within a particular library. For external references, the value is always 0. The value that is present in the entry indexes into the SUNW_verneed section. ----------- SUNW_verdef: This section contains a condensed version of the mapfile that is supplied to Sun's linker. It describes the tree structure that is in the mapfile. All strings are given as offsets into .dynstr. There are two important structures here: /* * Verdef and Verneed (via Veraux) flags values. */ #define VER_FLG_BASE 0x1 /* version definition of file itself */ #define VER_FLG_WEAK 0x2 /* weak version identifier */ /* * Verdef version values. */ #define VER_DEF_NONE 0 /* Ver_def version */ #define VER_DEF_CURRENT 1 #define VER_DEF_NUM 2 typedef struct { /* Version Definition Structure. */ Elf32_Half vd_version; /* this structures version revision */ Elf32_Half vd_flags; /* version information */ Elf32_Half vd_ndx; /* version index */ Elf32_Half vd_cnt; /* no. of associated aux entries */ Elf32_Word vd_hash; /* version name hash value */ Elf32_Word vd_aux; /* no. of bytes from start of this */ /* verdef to verdaux array */ Elf32_Word vd_next; /* no. of bytes from start of this */ } Elf32_Verdef; /* verdef to next verdef entry */ typedef struct { /* Verdef Auxiliary Structure. */ Elf32_Addr vda_name; /* first element defines the version */ /* name. Additional entries */ /* define dependency names. */ Elf32_Word vda_next; /* no. of bytes from start of this */ } Elf32_Verdaux; /* verdaux to next verdaux entry */ ------------- ----------- ----------- | Verdef | -> vd_aux -> | Verdaux | -> vda_next -> | Verdaux | -> vda_next -> ------------- ----------- ----------- | | vd_next V ------------- ----------- ----------- | Verdef | -> vd_aux -> | Verdaux | -> vda_next -> | Verdaux | -> vda_next -> ------------- ----------- ----------- | | vd_next V The values of the fields are: vd_version == VER_DEF_CURRENT vd_flags == VER_FLG_BASE if this is the first verdef structure VER_FLG_WEAK if this is a 'weak' version definition. vd_ndx == Index. This is the 'version' number used in the SUNW_versym section. vd_cnt == Number of Verdaux entries in linked list. vd_hash == elf_hash(string in first Verdaux structure) vd_aux == Number of bytes to first Verdaux structure vd_next == Number of bytes to next Verdef structure vda_name == Offset into .dynstr of the name. vda_next == Number of bytes to next Verdef structure This description is really somewhat incomplete. The first Verdaux always has special meaning - it is the name of the shared library for the base version, or it is the name of the version node we are defining if this is after the first Verdef structure. The entries after the first Verdaux are used to point to version nodes that this version node depends upon. An example is in order. Consider the following mapfile: >SUNW_1.1 { # Release X > global: > foo1; > local: > *; >}; > >SUNW_1.2 { # Release X+1 > global: > foo2; >} SUNW_1.1; > >SUNW_1.2.1 { } SUNW_1.2; # Release X+2 > > >SUNW_1.3a { # Release X+3 > global: > bar1; >} SUNW_1.2; > >SUNW_1.3b { # Release X+3 > global: > bar2; >} SUNW_1.2; > > >SUNW_1.3c { # Release X+3 > global: > bar2; >} SUNW_1.3a SUNW_1.3b; Here is a dump of the SUNW_verdef section that comes from this mapfile: vd_version vd_flags vd_ndx vd_cnt vd_hash vd_aux vd_next --- ------ -- -- ---------- --- --- vda_name vda_next (vda_name) --- --- ------------- 001 0x0001 01 01 0x0aca75ef 020 028 (Verdef) 037 000 test.so (Verdaux) 001 0x0000 02 01 0x0a3d2791 020 028 045 000 SUNW_1.1 001 0x0000 03 02 0x0a3d2792 020 036 054 008 SUNW_1.2 045 000 SUNW_1.1 001 0x0002 04 02 0x0d279f21 020 036 063 008 SUNW_1.2.1 054 000 SUNW_1.2 001 0x0000 05 02 0x03d27931 020 036 074 008 SUNW_1.3a 054 000 SUNW_1.2 001 0x0000 06 02 0x03d27932 020 036 084 008 SUNW_1.3b 054 000 SUNW_1.2 001 0x0000 07 03 0x03d27933 020 000 094 008 SUNW_1.3c 084 008 SUNW_1.3b 074 000 SUNW_1.3a ----------- SUNW_verneed: This section is created to indicate requirements from the libraries that we are linked against. It can appear in either shared libraries or in executables. As with the SUNW_verdef section, there are two structures that are used, and they are arranged in a very similar manner to the SUNW_verdef. /* * Verneed version values. */ #define VER_NEED_NONE 0 /* Ver_need version */ #define VER_NEED_CURRENT 1 #define VER_NEED_NUM 2 typedef struct { /* Version Requirement Structure. */ Elf32_Half vn_version; /* this structures version revision */ Elf32_Half vn_cnt; /* no. of associated aux entries */ Elf32_Addr vn_file; /* name of needed dependency (file) */ Elf32_Word vn_aux; /* no. of bytes from start of this */ /* verneed to vernaux array */ Elf32_Word vn_next; /* no. of bytes from start of this */ } Elf32_Verneed; /* verneed to next verneed entry */ typedef struct { /* Verneed Auxiliary Structure. */ Elf32_Word vna_hash; /* version name hash value */ Elf32_Half vna_flags; /* version information */ Elf32_Half vna_other; Elf32_Addr vna_name; /* version name */ Elf32_Word vna_next; /* no. of bytes from start of this */ } Elf32_Vernaux; /* vernaux to next vernaux entry */ In this case, the first Vernaux structure has no special meaning. vn_version == VER_NEED_CURRENT vn_cnt == Number of Vernaux structures vn_file == string table offset of file name vn_aux == byte offset to first element of vernaux array vn_next == byte offset to next Verneed structure. vna_hash == elf_hash(vna_name) vna_flags == Flags of referenced version. vna_other == Unused at the moment. vna_name == String table offset to needed node vna_next == Byte offset to next Vernaux structure Here is a dump of the elements of the structures. The order is vn_version, vn_cnt, vn_aux, vn_next, vn_file, and the actual string associated with vn_file. For the Verdaux, the ordering is vna_hash, vna_flags, vna_other, vna_name, vna_next, and the string associated with vna_name. 001 002 016 000 131 ../test1/foo.so 0x0a3d2792 000 000 147 016 SUNW_1.2 0x0d279f21 002 000 156 000 SUNW_1.2.1 -------------------------------------------------------------- Dynamic linker requirements - Sun's implementation. The dynamic linker doesn't have all that much to do at runtime. In effect, all it is doing is matching up things in the SUNW_verdef and SUNW_verneed sections to ensure that the libraries are supplying uptodate and complete interfaces. A dynamic linker that implements *just* Sun's version handling will not do anything different until all of the files are mapped. Once this has taken place, then it will go through the SUNW_verneed structures, and find the Verdef structure for the file that vn_file points to. Then, for each Vernaux structure, it will walk the Verdef structures in the associated file, and try and find a definition with the same name. Given that hash values are given in the structures, it is clear that the lookup will first consist of comparing hash values until we find a match. In all probabability, we have found the correct match, but I suppose that a strcmp would still be in order just in case. Note that the first Verdaux structure attached to each Verdef doesn't represent a dependency - it holds the name of the version node itself. Also, the first Verdef structure is for the implied base version. Thus it has a version number of 1, and it has only one Verdaux. The associated string is the SONAME of the library itself. Hopefully this is clear from the above example. -------------------------------------------------------------- Shortcomings of Sun's approach - changes we need to make. 1) There is no mechanism for the dynamic loader to locate the array of short ints that runs parallel to .dynsym. To solve this, a new .dynamic entry was added: #define DT_VERSYM 0x6ffffff0 This entry points to the beginning of the GNU_version section which contains the array of short ints. 2) We need to support the notion of a 'default' version of a function. We need to mark given functions as 'hidden' which means that unless you explicitly reference the function by version name, you won't bind to it. The 'default' version is bound to all references which don't specify a version at all. Internally this is encoded in the .GNU_version section. This section takes the meaning: struct vers { unsigned short ver_no:15; unsigned short hidden:1; } Thus the high bit in the array is used to indicate that a symbol is an older depreciated interface. In Sun's usage, the following values are used for the version number: 0 For definitions, always a local symbol 1 For definitions, always the implied base version. 2+ For definitions, the defined versions. The 'ver_no' field works the same way as it does under Sun's stuff for symbol definitions. For symbol references, the story is different - Sun always uses a version number of 0 in this field for that. To solve this, the version references are numbered in much the same way that the definitions are, and we start counting from the place where the definitions left off. Here is an example (objdump --private-headers displays this information): >Version definitions: >1 0x01 0x0ca7523f test2.so > >2 0x00 0x0c3b2451 GNU_1.1 > > >Version references: > Interfaces required from ./test.so: > 0x03d27931 0x00 03 SUNW_1.3a There are effectively two definitions - the base and GNU_1.1, and these are numbered 1 and 2. There is one reference, and this is numbered 3. Thus it should always be easy to determine whether a given version number indexes into the definitions or the references. If there are no version definitions, version references start at 2. The version number of each reference definition is stored in the vda_other field. In the case where the reference is to an object for which no version information exists, the version number will continue to be 0 as it is in Sun's implementation. The number 1 is used for definitions in the base description, so 2 is the first free index. The number of references and definitions can be found by looking in the sh_info field of the section header for .GNU_version_d and .GNU_version_r. The same numbers are also available at runtime via the DT_VERDEFNUM and DT_VERREFNUM .dynamic entries. Thus there is no need to ever count them - a quick compare should always tell us what we need. -------------------------------------------------------------------- Changes to the dynamic linker that need to be made. 1) Implement Sun's level of checking. Here we are just comparing the SUNW_verdef and SUNW_verneed sections to make sure we have the correct interfaces for everything. 2) Look for DT_VERSYM in each object, and make a note of it's location. 3) When binding symbols look up the version number of the reference. As we find suitable names, look up the version number of the potential definition. The following pseudo-code is probably pretty close to what we want: if( refno == 0 && !def_hidden ) { /* * For Sun compatibility. Ref always == 0 there so we take * the first non-hidden version to be the correct one. * I *think* this is the most correct thing we can do , but * it obviously leads to breakage if the interface changes. * Sun doesn't allow for interface changes, so this is probably * the most correct thing to do. */ goto accept; } else { Elf32_External_Version_Def * df; Elf32_External_Version_DefAux * dfa; Elf32_External_Version_NeedAux * na; df = LookupDef(obj1, def_no); dfa = LookupDefAux(obj1, def_no); na = LookupDef(obj2, ref_no); if( df->vd_hash == na->vna_hash && strcmp(strtab(obj2, na->vna_name), strtab(obj1, dfa->vda_name)) == 0) goto accept; } One thing we should try and keep - any shared libraries build by Sun's linker should work correctly with our stuff. The characteristics of these libraries is that none of the symbols will be hidden, and references will always be marked with a version number of 0. Our libraries will have both hidden and non-hidden versions of symbols, and references will reflect a non-zero index into the GNU_version_r section. Even in libraries that we generate, an attempt to reference a non-versioned library will result in a stored version number of 0 for symbol references. 4) Fix dlsym() to parse argument, and grab specific versions of functions. This is more or less just a matter of picking out the '@' character and doing the right thing with it. 5) The Sun versioning scheme stores full 32 bits of elf_hash() for the textual names of the version nodes. Thus comparing two versions should initially start by comparing the hash codes - if these match the odds are great we have a full match, but I suspect that we still need to do a strcmp() or we run the risk of a mismatch.