diff --git a/content/designs/application-bundle-metadata.md b/content/designs/application-bundle-metadata.md new file mode 100644 index 0000000000000000000000000000000000000000..689e04be843dbc5fabfbcb0f6275de91348e1e6d --- /dev/null +++ b/content/designs/application-bundle-metadata.md @@ -0,0 +1,583 @@ +--- +title: Application bundle metadata +short-description: Associating arbitrary metadata with entire app-bundles (implemented) +authors: + - name: Simon McVittie +--- + +# Application bundle metadata + +This document extends the Apertis [Applications concept design] +to cover metadata about [application bundles][Application bundle] (app-bundles). + +## Terminology and concepts + +See the [Apertis glossary] for background information on terminology. +Apertis-specific jargon terms used in this document are hyperlinked +to that glossary. + +## Use cases + +These use-cases are not exhaustive: we anticipate that other uses will +be found for per-application-bundle metadata. At the time of writing, +this document concentrates on use-cases associated with assigning priorities +to requests from an app-bundle to a platform service. + +### Audio management priorities + +Assume that the Apertis audio management component (which is outside +the scope of this document) assigns priorities to audio streams based +on [OEM]-specific rules, potentially including user configuration. + +Suppose the author of an app-bundle has a legitimate reason to have their +audio streams played with an elevated priority, for example because their +app-bundle receives voice calls which should take precedence over music +playback. + +Also suppose a different, malicious app-bundle author wishes to interrupt +the driver's phone call to play an advertisement or other distracting sound +as an audio stream. + +The Apertis system must be able to distinguish between the two app-bundles, +so that requests for an elevated priority from the first app-bundle +can be obeyed, while requests for an elevated priority from the second +app-bundle are rejected. + +We assume that the app-bundles have been checked by an app-store +curator before publication, and that the first app-bundle declares a special +[permission][Permissions] in its app manifest, resulting in the app framework +allowing it to flag its audio stream in ways that will result in it being +treated as important, and hence superseding less important audio. +Conversely, if the second app-bundle had declared that permission, we +assume that the app-store curator would have recognised this as inappropriate +and reject its publication. + +### Notification and dialog priorities + +Assume that the Apertis compositor (which is outside the scope of this +document) assigns priorities to notifications based on [OEM]-specific rules, +potentially including user configuration. Depending on the OEM's chosen +UX design, app-modal and system-modal dialogs might be treated as +visually similar to notifications; if they are, the compositor author +might also wish to assign priorities from the same ranges to dialogs. + +Similar to the [][Audio management priorities] use case, app-bundles that +have a legitimate reason for their notifications or dialogs to be +high-priority must be able to achieve this, but malicious app-bundles +whose authors aim to misuse this facility must not be able to achieve +an elevated priority. + +### App-bundle labelling + +A UX designer might wish to arrange for all user interface elements +associated with a particular app-bundle (including notifications, windows, +its representation in lists of installed app-bundles, and so on) +to be marked with an unambiguous indication of the app-bundle that created +them, such as its name and icon. + +In particular, the Compositor Security concept design +(which is [work in progress][Compositor Security design] at the time of +writing) calls for windows and notifications to be visually associated +with the app-bundle that created them, so that malicious app-bundle authors +cannot make the user believe that information presented by the +malicious app-bundle came from a different app-bundle (*output integrity*), +and also cannot convince the user to enter input into the malicious +app-bundle that they had only intended to go to a different app-bundle +(a *trusted input path*, providing *input confidentiality* for the +non-malicious app-bundle). + +Note this mechanism will not be effective unless either the app-store +curator avoids accepting app-bundles with the same or confusingly similar +names or icons, or the UX designer disambiguates app-bundles using +something that is guaranteed to be unique, such as the app-bundle ID +(which is not necessarily a desirable or user-friendly UX). This applies +wherever app-bundles are listed, such as the app store's on-device user +interface, the app-store's website, or a list of installed app-bundles +in the device's equivalent of Android's Settings → Apps view. + +## Requirements + +### App-bundle metadata + +An Apertis platform library to read app bundle metadata must be made +available to platform components, featuring at least these API calls: + +* given a bundle ID, return an object representing the metadata +* list all installed bundles (either built-in or store) with their + IDs and metadata +* emit a signal whenever the list of installed bundles changes, + for example because a store app bundle was installed, removed, + upgraded or rolled back (simple change-notification) + +### Labelling requirements + +Each app-bundle must contain a human-readable name in international English. +It must also be possible for an app-bundle to contain translated versions +of this name for other languages and locales, with the international English +version used in locales where a translation is not provided. + +Each app-bundle must be able to contain the name of the authoring company +or individual. + +Each app-bundle must contain a version number. +To let the application manager make appropriate decisions, all +application bundles must a format for their version strings that can +be compared in a standard way. How an application developer chooses to set +the version numbers is ultimately their decision, but Apertis must be +able to determine whether one version number is higher than another. + +Collabora recommends requiring version numbers to be dotted-decimal +(one or more decimal integers separated by single dots), with +“major.minor.micro†(for example `3.2.4`) recommended but not strictly +required. + +There will be a “store version†appended to the +version string after a dash, similar to the versioning scheme used by +`dpkg`; for example, in the version string `3.2.4-1`, the `1` is the store +version. The store version allows the store to push an update even if +the application version hasn't changed, and it will be the lowest +significant figure. For example, version `2.2.0-1` is newer than version +`2.1.99-4`. The store version will re-start at 1 any time the application +version is increased, and will be incremented if a new intermediate +release is required. + +### Secure identification + +Apertis [platform] services that receive requests from an unknown process +must be able to identify which app-bundle the process belongs to. To support +this, the request must take place via a channel that guarantees integrity +for that process's identification: it must not be possible for a +malicious process to impersonate a process originating from a different +app-bundle. + +### Audio stream and notification requirements + +The information required by the audio manager must be represented as +one or more metadata key-value pairs that can be read from the app bundle +metadata. + +The information required by the notification implementation must be +represented as one or more metadata key-value pairs that can be read +from the app bundle metadata. + +We anticipate that audio management and notifications will not always +assign the same priority to each app-bundle, therefore it must be +possible for the metadata keys used by audio management and those +used by notifications to be distinct. + +### App-store curator oversight + +It must be straightforward for an app-store curator to inspect the +metadata that is present in an app-bundle, for example so that they can +refuse to publish app-bundles that ask for audio or notification priorities +that they have no legitimate reason to use, or for which the name, +icon or other information used for [][App-bundle labelling] is misleading. + +### Store app-bundle confidentiality + +Ordinary unprivileged programs in store app-bundles must not be able +to use these API calls to enumerate other installed store app-bundles. +For example, if those API calls are implemented in terms of a D-Bus +service, it must reject method calls from store app-bundles, or if those +API calls are implemented in terms of reading the filesystem directly, +store app-bundles' AppArmor profiles must not allow reading the necessary +paths. + +*Non-requirement*: it is acceptable for ordinary unprivileged programs to +be able to enumerate installed built-in app-bundles. Built-in app-bundles +are part of the platform, so there is no expectation of confidentiality +for them. + +### Extension points + +We anticipate that vendors will wish to introduce non-standardized +metadata, either as a prototype for future standardization or to support +vendor-specific additional requirements. It must be possible to include +new metadata fields in an app-bundle, without coordination with a central +authority. For example, this could be achieved by namespacing new +metadata fields using a DNS name ([as is done in D-Bus][D-Bus names]), +namespacing them with a URI ([as is done in XML][Namespaces in XML]), +or using the [`X-Vendor-NewMetadataField` convention][X-Vendor] (as is done +in email headers, HTTP headers and +[freedesktop.org `.desktop` files][Desktop Entry Specification]). + +### Future directions + +#### Platform API requirements + +The application bundle metadata should include a minimum system version +(API version) required to run the application, for example to prevent the +installation of an application that requires at least Apertis 16.12 in an +Apertis 16.09 environment. A specific versioning model for the Apertis API has +not yet been defined. + +#### Declarative permissions + +The application bundle metadata should include simple, declarative +[permissions] which can be used to generate an AppArmor profile +in an automated way. The [Permissions concept design] tracks this work. + +#### Declaring system extensions + +The [Applications concept design] calls for app-bundle metadata to describe +the types of [system extension][] (themes, addons, plugins, etc.) provided +by an app-bundle. There is currently no detailed specification for this. + +AppStream upstream XML already supports declaring that a component +(app-bundle) is an addon to another component (via the `addon` type) +or to the system as a whole (via the `<provides>` element). There is no +specific metadata to describe themes; discussion has been started in +[AppStream issue 67](https://github.com/ximion/appstream/issues/67). + +#### Declaring where applications store non-essential files + +The [Applications concept design] suggests that application bundle metadata +might declare where applications store non-essential files, so that the +system can delete those files when disk space runs out. + +#### Placing symbolic links in centralized locations + +The [Applications concept design] suggests that for applications not +originally designed for Apertis, which might write to locations like +`~/.someapp`, application bundle metadata +might declare where the platform must create symbolic links to cause +those applications to read and write more appropriate locations on Apertis, +for example +`~/.someapp → /var/Applications/com.example.SomeApp/users/${UID}/data`. + +#### Declaring an EULA + +App-bundle metadata should include a way to specify an EULA which the +user must agree with before the application bundle will be installed. +See [AppStream issue 50](https://github.com/ximion/appstream/issues/50) +for work on this topic in the AppStream specification. + +Other files in the +license directory of the bundle but not mentioned in this way will +still be copied the device, and the HMI components must provide some +way to view that information later. + +#### Placeholder icons + +Since the installation process is not instant, a placeholder icon should be +provided and specified in the version of the application bundle metadata +that is downloaded from the application store. This icon will be copied into +the store directory by the application store during publication. It will +be displayed by the application manager instead of the application until +the installation is completed. The application launcher will also be +able to display a progress indicator or – if multiple applications are +being installed – a position in the install queue. + +#### Platform component metadata + +Although it is not a requirement at this stage, we anticipate that it +might be useful in the future to be able to associate similar metadata +with platform components, such as the Newport download manager. + +## Other systems + +This section contains a very brief overview of the analogous functionality +in other open-source platforms. + +### freedesktop.org AppStream + +Several open-source desktop platforms such as GNOME and KDE, and Linux +distributions such as Ubuntu and Fedora, have adopted [AppStream] +as a shared format for software component metadata, complementing the +use of [`.desktop` files][Desktop Entry Specification] for +[entry points]. + +The AppStream specification refers to *components*, which are a +generalization of the same concept as Apertis app-bundles, and can +include software from various sources, including traditional distribution +packages and bundling technologies such as [][Flatpak]. + +#### Flatpak + +The [Flatpak] framework provides user-installable applications +analogous to Apertis app-bundles. It uses [AppStream] for app-bundle +metadata, together with [`.desktop` files][Desktop Entry Specification] for +entry points. + +### Snappy + +Ubuntu [Snappy] packages (snaps) are also analogous to Apertis app-bundles. +[Their metadata][Snappy metadata] consists of a Snappy-specific YAML +file describing the snap, again together with +[`.desktop` files][Desktop Entry Specification] +describing entry points. + +### Android + +Android *apps* are its equivalent of Apertis app-bundles. Each app has a +single [App manifest][Android App Manifest] file, +which is an XML file with Android-specific contents, and describes both +the app itself, and any *activities* that it provides (activities +are analogous to Apertis [entry points]). + +## Design recommendations + +The [Apertis Application Bundle Specification] describes the metadata +fields that can appear in an application bundle and are expected to +remain supported long-term. This document provides rationale for those +fields, suggested future directions, and details of functionality that +is not necessarily long-term stable. + +### App-bundle metadata design + +We anticipate that other designs involving app-bundles will frequently +require other metadata beyond the use-cases currently present in this +document, for example categories. +As such, we recommend introducing a general metadata file into built-in +and store app-bundles. + +This metadata file could have any syntax and format that is readily parsed. +To minimize duplicate effort, we recommend using [AppStream XML][AppStream], +a format designed to be shared between desktop environments such as GNOME +and KDE, and between Linux distributions such as Ubuntu and Fedora. + +Each built-in app bundle should install an [AppStream upstream XML] +metadata file. If the built-in app bundle has [entry points], +then its metadata file must be made available as +`/usr/share/metainfo/${bundle_id}.appdata.xml` (where `${bundle_id}` +represents its bundle ID), and its `<id>` must be +`<id type="desktop">${entry_point_id}.desktop</id>` where `${entry_point_id}` +represents its primary entry point (typically the same as the bundle ID). +`/usr/share/metainfo/${bundle_id}.appdata.xml` will typically be a symbolic +link to +`/usr/Applications/${bundle_id}/share/metainfo/${bundle_id}.appdata.xml`. + +If the built-in app bundle has no entry points (for example a theme), +then its metadata file must be available as +`/usr/share/metainfo/${bundle_id}.metainfo.xml` (where `${bundle_id}` +represents its bundle ID), and its `<id>` must be the same as its bundle ID. +Again, this would typically be a symbolic link to a corresponding path +in `/usr/Applications/${bundle_id}`. + +Each store app bundle should install an AppStream upstream XML metadata +file into `/Applications/${bundle_id}/share/metainfo/${bundle_id}.appdata.xml` +or `/Applications/${bundle_id}/share/metainfo/${bundle_id}.metainfo.xml` +(depending on whether it has entry points), with contents +corresponding to those specified for built-in app bundles. +For [][Store app-bundle confidentiality], a store app-bundle's +AppArmor profile must not allow it to read the contents of a different +store app-bundle, and in particular its AppStream metadata. + +AppStream upstream XML is normally also searched for in the +[deprecated path][deprecated AppStream path] +`/usr/share/appdata`, but for simplicity, we do not require +the `share/appdata/` directory to be processed for application bundles. +Since existing application bundles do not contain it, this does not create +a compatibility problem. + +For [][App-store curator oversight], if the implementation reads +other sources of +metadata from a store app-bundle (for example the `.desktop` entry points +provided by the app-bundle), then the implementation must document those +sources. The app-store curator must inspect all of those sources. +This requirement does not apply to built-in app-bundles, which are assumed +to have been checked thoroughly by the platform vendor at the time the +built-in app-bundle was integrated into the platform image. + +The Apertis platform must provide cache files whose timestamps +change whenever there is a change to the set of store or built-in app +bundles, or to those bundles' contents. +These cache files should be monitored by the +[libcanterbury-platform][Canterbury] library, using the standard +`inotify` mechanism. Any cache files that contain store app-bundles must +not be readable by a store app-bundle, to preserve +[][Store app-bundle confidentiality]. + +The other APIs that are required are straightforward to implement in the +[libcanterbury-platform][Canterbury] library by reading from the cache files. +Because this is done in a library (as opposed to a D-Bus service), +the implementation of these APIs will run with the privileges of the +process that is requesting the information: in particular, if an +unprivileged process attempts to read the cache files, this will be +prevented by its AppArmor profile, regardless of whether it is using +libcanterbury-platform or reading the files directly. + +We recommend that store app-bundles and built-in app-bundles appear in +separate cache files, for several reasons: + +* In the current design for Apertis operating system upgrades, + the metadata files for built-in app-bundles and platform components + in `/usr/Applications/*/share/*` and `/usr/share/*` are + only updated during an operating system upgrade, by either `dpkg` or + by unpacking a new OS filesystem hierarchy that will be activated + after the next reboot. In the `dpkg` case, it is sufficient to have a + `dpkg` trigger monitoring these directories, and update the built-in + app-bundle cache when they have changed, leaving the store app-bundle + cache unchanged. Similarly, in the whole-OS upgrade case, the built-in + app-bundle cache can be provided in the new OS filesystem or rebuilt + during startup, again leaving the store app-bundle cache unchanged. + +* Conversely, the metadata files for store app-bundles are updated by + the Ribchester subvolume manager when it installs a new store app-bundle, + which can occur at any time. When it does this, it is sufficient to update + the store app-bundle cache, leaving the built-in app-bundle cache unchanged. + +* If Apertis moves to a more static model for deployment of the platform + (for example using disk images or OSTree to deploy pre-built filesystem + hierarchies), the built-in app-bundle cache would be entirely static and + could be included in the pre-built filesystem hierarchy. + +* Using separate caches makes it straightforward to ensure that if + a store app-bundle with the same name as a built-in app-bundle is + somehow installed, the built-in app-bundle takes precedence. + +Any metadata keys and values that have not been standardized by the AppStream +project (for example audio roles that might be used to determine a bundle's +audio priority) must be represented using [][Extension points] within the +AppStream metadata. The formal [AppStream specification][AppStream] does +not provide an extension point, but the +[reference implementation][libappstream] +and [appstream-glib] both provide support for a `<custom>` element +with `<value>` children. We recommend using that element for extension +points. See the [Apertis Application Bundle Specification] for +details. + +When a store or built-in app-bundle is added, removed or changed, +the Apertis platform must update the corresponding cache file. + +#### Future directions + +AppStream XML is equally applicable to platform components, which can +install metadata in `/usr/share/metainfo` in the same way as built-in +app-bundles. + +Because built-in app-bundles and platform components have the same update +schedule and are managed by the same vendor (they are both part of the +platform), we anticipate that platform components should use the same cache +file as built-in app-bundles. + +### Secure identification design + +Consumers of requests from app-bundles, such as the audio manager or the +notifications implementation, must receive the bundle ID alongside the +request using a trusted mechanism. If the request is received via D-Bus, +the bundle ID must be retrieved by using +the [GetConnectionCredentials] method call to receive the AppArmor context, +then parsing the context to get the bundle ID and whether it is a +store or built-in app-bundle. If the request takes the form +of a direct `AF_UNIX` socket connection, the bundle ID must be retrieved +by reading the `SO_PEERCRED` socket option, then parsed in the same way. +Consumers of app-bundle priorities should do this by using the +[CbyProcessInfo] objects provided by [libcanterbury][Canterbury]. + +Because the Apertis [Security concept design] does not place a security +boundary between different processes originating from the same app-bundle, +all identification of app-bundles should be carried out using their +bundle IDs. In particular, consumers of requests from app-bundles should +only use the requester's AppArmor label to derive its bundle ID and +whether it is a store or built-in app-bundle, and must not use the +complete AppArmor label, the complete path of the executable or the +name of the corresponding [entry point][entry points] in access-control +decisions. + +### Labelling design + +[AppStream upstream XML] already contains standardized metadata +fields for a name, author name etc. + +The name (and several other metadata fields) can be translated via the +`xml:lang` attribute. For example, GNOME Videos (Totem) has many +language-specific names, starting with: + +--- + <name>Videos</name> + <name xml:lang="af">Video's</name> + <name xml:lang="ar">Ùيديو</name> + <name xml:lang="as">à¦à¦¿à¦¡à¦¿à¦…'সমূহ</name> + <name xml:lang="be">ВідÑа</name> +--- + +AppStream upstream XML does not include an icon, although the derived +[AppStream collection XML] format published by redistributors does. We recommend +that the app-bundle should contain a PNG icon whose name matches its bundle ID, +installed to its `share/` directory as part of the `hicolor` fallback theme. + +> The reserved icon theme name `hicolor` is used as the fallback whenever +> a specific theme does not have the required icon, as specified in the +> [freedesktop.org Icon Theme specification]. The name `hicolor` was chosen +> for historical reasons. + +For example, `com.example.ShoppingList` would include +`/Applications/com.example.ShoppingList/share/icons/hicolor/64x64/apps/com.example.ShoppingList.png`. +If the app-store uses AppStream collection XML, then the process used to +build AppStream collection XML from individual applications' AppStream upstream +XML files should assume this icon name and include it in the collection XML. + +**Open question:** We should require a specific size for the icon, to avoid +blurry or blocky app icons caused by resizing. GNOME Software uses 64×64 +as its baseline requirement, but recommends larger icons, for example 256×256. +[iOS][iOS icon sizes] uses 1024×1024 for the App Store and ranges from 60×60 +to 180x180 for on-device icons. [Android][Android icons sizes] uses 512×512 +for the Google Play Store and ranges from 36×36 to 96×96 for on-device icons. +What are our preferred sizes? + +#### Future directions + +Platform components that are not part of an app-bundle do not have bundle IDs. +We anticipate that [][Platform component metadata] might be identified by +a separate identifier in the same reversed-DNS namespace, and that the +consumer of requests might derive the platform component identifier by +looking for components that declare metadata fields matching the +requester's AppArmor label (part of the AppArmor context). + +## Summary + +* [][app-bundle metadata] is read from the cache that summarizes + built-in and store app-bundles. The [libcanterbury-platform][Canterbury] + library provides the required APIs; in particular, change notification + can be done using `inotify`. +* [][Secure identification] is provided by + [parsing the requesting process's AppArmor + context][Secure identification design]. +* The [][Audio stream and notification requirements] + are addressed by providing their desired metadata in the app-bundle + metadata, in the form of arbitrary key/value pairs. +* [][App-store curator oversight] is facilitated by documenting all + of the sources within a store app-bundle from which the implementation + gathers metadata to populate its cache. +* [][Store app-bundle confidentiality] is provided by storing the cache + file describing installed store app-bundles in a location where store + app-bundles cannot read it, and by avoiding the need to introduce a + D-Bus service from which they could obtain the same information. +* The [appstream-glib] library supports [][Extension points] in + AppStream XML. + +<!-- External references --> +[Android App Manifest]: https://developer.android.com/guide/topics/manifest/manifest-intro.html +[Android icon sizes]: https://developer.android.com/guide/practices/ui_guidelines/icon_design_launcher.html +[Apertis Application Bundle Specification]: https://appdev.apertis.org/documentation/bundle-spec.html +[Apertis Glossary]: https://wiki.apertis.org/Glossary +[Applications concept design]: applications.md +[Application bundle]: https://wiki.apertis.org/Glossary#application-bundle +[AppStream]: https://www.freedesktop.org/software/appstream/docs/ +[AppStream collection XML]: https://www.freedesktop.org/software/appstream/docs/chap-CollectionData.html +[AppStream extension points]: https://lists.freedesktop.org/archives/appstream/2016-July/000048.html +[AppStream upstream XML]: https://www.freedesktop.org/software/appstream/docs/chap-Metadata.html +[AppStream-Glib]: https://github.com/hughsie/appstream-glib/ +[Canterbury]: https://gitlab.apertis.org/appfw/canterbury +[CbyProcessInfo]: https://gitlab.apertis.org/appfw/canterbury/blob/master/canterbury/process-info.h +[Compositor Security design]: https://wiki.apertis.org/Compositor_security +[D-Bus names]: https://dbus.freedesktop.org/doc/dbus-specification.html#message-protocol-names +[Deprecated AppStream path]: https://www.freedesktop.org/software/appstream/docs/chap-Metadata.html#spec-component-location +[Desktop Entry Specification]: https://specifications.freedesktop.org/desktop-entry-spec/desktop-entry-spec-latest.html +[Entry points]: https://wiki.apertis.org/Application_Entry_Points +[Flatpak]: http://flatpak.org/ +[freedesktop.org Icon Theme Specification]: http://standards.freedesktop.org/icon-theme-spec/icon-theme-spec-latest.html +[GetConnectionCredentials]: https://dbus.freedesktop.org/doc/dbus-specification.html#bus-messages-get-connection-credentials +[iOS icon sizes]: https://developer.apple.com/library/safari/documentation/UserExperience/Conceptual/MobileHIG/IconMatrix.html +[libappstream]: https://www.freedesktop.org/software/appstream/docs/api/index.html +[OEM]: https://wiki.apertis.org/Glossary#oem +[Namespaces in XML]: https://www.w3.org/TR/REC-xml-names/ +[Permissions]: permissions.md +[Permissions concept design]: permissions.md +[Platform]: https://wiki.apertis.org/Glossary#platform +[Security concept design]: security.md +[Snappy]: http://snapcraft.io/ +[Snappy metadata]: https://developer.ubuntu.com/en/snappy/guides/meta/ +[System extension]: applications.md#system-extensions +[X-Vendor]: https://specifications.freedesktop.org/desktop-entry-spec/desktop-entry-spec-latest.html#extending diff --git a/content/designs/application-entry-points.md b/content/designs/application-entry-points.md new file mode 100644 index 0000000000000000000000000000000000000000..32645a055c632460b6ad78b1ea395c909119a961 --- /dev/null +++ b/content/designs/application-entry-points.md @@ -0,0 +1,789 @@ +--- +title: Application entry points +short-description: Launchable programs and menu entries in app bundles (implemented) +authors: + - name: Simon McVittie +--- + +# Application entry points + +## Requirements + +[Application bundles] may contain *application entry points*, which are +any of these things: + +* a [graphical program] that would normally appear in a menu +* a graphical program that would not normally appear in a menu, but + can be launched in some other way, for example as a [content-type + handler][Content hand-over] +* a [user service] that starts during device startup +* a user service that is started on-demand + +Desktop environments provide metadata about these programs so that they +can be launched. + +At least the following use-cases exist: + +* mildenhall-launcher displays a categorized menu of user-facing + programs. Typical graphical programs such as the Rhayader web browser + must appear here, with a name and an icon. +* It must be possible to translate the name into multiple languages, + with a default (international English) name used for languages where + there is no specific translation. +* Different manufacturers' launcher implementations might have a different + taxonomy of categories for programs. +* If two graphical programs have the same user-facing name, it might be + useful to be able to provide a longer distinguishing name. For example, + if both Chrome and Firefox are installed, they might be called "Firefox + Browser" and "Chrome Browser", but if only one is installed, it might + simply be called "Browser". +* Certain graphical programs should be hidden from the menu, but treated + as a first-class program during user interaction. As of October 2015, + the Canterbury application manager has hard-coded special cases for + various removable storage browsing applications; an improved metadata + format would allow these special cases to be generalized. +* Some graphical programs present multiple *views* which may appear + separately in menus, but are all implemented in terms of the same running + process. For example, the Frampton audio player appears in the menu three + times, as "Albums", "Artists" and "Songs". However, ideally there would + only be one Frampton HMI process at any given time, even if the user + switches between views. +* Some programs should be started during device startup or user login. +* In the SDK images, Apertis applications and services should not + necessarily be listed in the XFCE menus, and XFCE applications should + not be listed in the "device simulator". + +### Security and privacy considerations + +The list of installed store application bundles in /Applications is +considered to be private information, for several reasons: + +* the general operating principle for Apertis' app framework is that apps + must not affect each other, except where given permission to interact, + ensuring "loose coupling" between apps +* the presence of certain app bundles might be considered to be sensitive + (for example, app bundles correlated with political or religious views) +* the complete list could be used for user fingerprinting, for example + guessing that users of an online service share a device by observing + that they have the same list of app-bundles + +The list of installed entry-points is almost equivalent to the list +of store application bundles and has similar considerations. However, +some components cannot work without a list of store application bundles, +or a list of their entry points. This leads to some privacy requirements: + +* Certain platform components such as the Canterbury app manager, the + Didcot content handover service, and the mildenhall-launcher app-launching + HMI require the ability to list store application bundles and/or their + entry points. They must be able to do so. +* Store applications with special permissions might also be allowed to + list store application bundles and/or their entry points. +* Store applications may list the entry points that advertise a particular + *public interface*, as defined in the [Interface discovery] design. +* Store applications without special permissions must not be able to + enumerate store application bundles that do not contain an entry point + advertising a public interface, either directly or by enumerating entry + points and inferring the existence of a bundle from its entry points. + +Unlike store application bundles, we suggest that the list of installed +built-in application bundles in `/usr/Applications` should *not* be +considered to be private. This list will be the same for every instance +of the same platform image, so an application author could learn this +list by querying the platform image variant and version, then matching +that to a pre-prepared list of application bundles known to exist in +their own copy of the same image. Conversely, because this list is the +same for every instance of the same platform image, it is not useful +for user fingerprinting. + +### Menu entries + +Optionally, a single entry point may be specified to provide an icon +for presentation in the application launcher. If no icon is presented it +won't be obvious to the user that they have the application installed, +so the application store screening process should carefully consider +whether an application should be allowed to install services and type +handlers with no icon for the launcher. + +The Applications concept design has historically assumed that application +bundles should be constrained to contain at most one menu entry. However, +one of the reference app-bundles developed as part of Apertis (the +[Frampton media player][Frampton multiple entry points]) has multiple +menu entries, so this document has assumed that this constraint is no longer +desired. + +### Agents + +Agents should be specified as entry points, with a localized list of names +for the agent, along with the location of the executable file to launch. +Since agents can be long running and have an impact on device +performance, any application with an agent should also set the `agent` +[permission][permissions] so the user +can choose not to install the application. + +### Non-requirements + +[System services] are outside the scope of this design. + +## Recommendation + +The [Apertis Application Bundle Specification] describes the fields +that can appear in application entry points and are expected to +remain supported long-term. This document provides rationale for those +fields, suggested future directions, and details of functionality that +is not necessarily long-term stable. + +### App identification + +Each built-in or store application bundle has a *bundle ID*, which is +a [reversed domain name] such as `org.apertis.Frampton`. + +Each entry point within an application bundle has an *entry point ID*, +which is a reversed domain name such as `org.apertis.Frampton.Agent`. + +For simple bundles with a single entry point, the bundle ID and the entry point +ID should be equal. + +For more complex bundles with multiple entry points, the entry point ID should +*start with* the bundle ID, but may have additional components. + +All names should be allocated in a namespace controlled by the author +of the bundle — in particular, Apertis applications should be in +`org.apertis`. Sample code that is not intended to be used in production +should be placed in `com.example`, with `org.example` and `net.example` +also available for code samples that need to demonstrate the interaction +between multiple namespaces (we prefer `com.example`, as a hint to +developers that reversed domain names do not always start with "org"). + +### Desktop entries + +Each Apertis *application entry point* is represented by a standard +freedesktop.org [Desktop Entry][Desktop Entry Specification] (a +`.desktop` file in `XDG_DATA_DIRS/applications`). The desktop file must +be named using the entry point ID, so `org.apertis.Frampton.Agent` would have +`org.apertis.Frampton.Agent.desktop`. + +The [localestring] mechanism is used for translated strings. + +Built-in application bundles install their desktop +files in `${prefix}/share/applications`, which expands to +`/usr/Applications/${bundle_id}/share/applications`. They also install +symbolic links in `/usr/share/applications` pointing to the real files. It +is technically possible for any process to read this location. + +Store applications install their desktop files in +`${prefix}/share/applications`, which expands to +`/Applications/${bundle_id}/share/applications`. The app +installer is responsible for creating symbolic links in +`/var/lib/apertis_extensions/applications` pointing to the real +files. Only processes with appropriate permissions are allowed to read +these locations. + +Apertis applications must have the `X-Apertis-Type` key in their metadata, +so that they will be listed in Apertis. They should usually also have +`OnlyShowIn=Apertis;` so that they do not appear in the XFCE desktop +environment menu in SDK images. + +The value of the `Exec` key must start with an absolute path to the +executable below `${prefix}`. This ensures that the application framework +can detect which app-bundle the executable belongs to. + +Entry points that would not normally appear in a menu, including all +background services (agents), should have `NoDisplay=true`. + +The `Interfaces` key is used for [Interface discovery]. In particular, +the following interfaces are defined: + +* `org.apertis.GlobalSearchProvider`: Indicates that the application + is a global search provider, equivalent to the `supports-global-search` + schema entry. + +The standard `MimeType` key controls the possible [content-type +and URI-scheme associations][Content hand-over]. For example, +`x-scheme-handler/http` is used in desktop environments such as GNOME +to designate an application as capable of acting as a general-purpose +web browser, and we will do the same here. The Didcot service mediates +applications' access to this information; for example, it may set +priorities or ignore certain applications or associations altogether. + +Services that parse desktop files should use [the implementation in +GLib][GDesktopAppInfo], or an Apertis-specific API built on top of that +implementation. + +#### Additional recommended keys + +The following additional keys are defined in the `[Desktop Entry]` group. + +* `X-Apertis-ParentEntry` (string): For situations where multiple + menu entries start the same program in different modes, all but one + of those menu entries set `X-Apertis-ParentEntry` to the entry point ID of + the remaining menu entry. See [][Multiple-view applications] and + the [D-Bus Activation][Bundle spec D-Bus activation] section of the + Apertis Application Bundle Specification. + +* `X-Apertis-ServiceExec` (string): A command-line similar to `Exec` + that starts the entry point in the background, without + implicitly *activating* it (causing it to show a window) if it is + a graphical program. For example, entry points that use `GApplication` + will usually use the same executable as for `Exec` here, but add the + `--gapplication-service` option to it. + See the [D-Bus Activation][Bundle spec D-Bus activation] section of the + Apertis Application Bundle Specification. + +* `X-GNOME-FullName` (localestring): The human-readable full name of the + application, such as `Rhayader Web Browser`. This key is already used + by the GLib library, and by desktop environments based on it (such as + GNOME). Like `Name`, this is a "localestring": non-English versions can be + provided with syntax like `X-GNOME-FullName[fr]=Navigateur Web Rhyader`. + +#### Potential future keys + +The following additional keys have been proposed for the `[Desktop Entry]` +group. + +* `X-Apertis-BandwidthPriority` (string): Bandwidth priority, currently + chosen from `highest`, `high`, `normal`, `low` or `lowest`. As a future + extension, numeric priorities could be added, with those strings mapped + to reasonable values. + +#### Audio roles + +> Requirements-gathering for the audio manager is ongoing. +> An `X-Apertis-AudioRole` key was initially proposed, but it seems +> likely that support for specifying a default audio role for PulseAudio +> streams that do not specify one will be moved from entry points +> into [application bundle metadata]. + +The audio role should have one of the [well-known media roles defined +by PulseAudio][PulseAudio media roles]. + +Additionally, Apertis defines the following roles. Their semantics are +not clear, and they should be clarified or deprecated. + +* `none` +* `interrupt` +* `record` (possibly the same thing as PulseAudio's `production`, + denoting an application that creates or edits audio files, such as a + sound recorder) +* `external` +* `unknown` + +#### Additional provisional/deprecated keys + +The following provisional keys are defined in the `[Desktop Entry]` group, +but are anticipated to be superseded, adjusted or redefined in future. + +* `X-Apertis-Type` (string): The application type, chosen + from `application`, `service`, `ext-app`, `agent-service`, + `startup-application`. Applications with no `X-Apertis-Type` are not + currently run or displayed in Apertis. This should eventually be replaced + with a set of boolean flags describing specific behaviours, such as + "start immediately" and "is expected to display a window"; these could + either be flags in the file, or indicated in another appropriate way, + for example a symbolic link in `/etc/xdg/autostart` for applications + and services that should be started immediately. +* `X-Apertis-CategoryLabel` (string; this would normally be a + localestring, but the current mildenhall-launcher relies on specific + string values for category labels, so translating it is not useful): + The name of the menu category. This will be implemented in the short + term to keep the current version of Mildenhall-Launcher operational, but + should be considered to be deprecated. Instead, launchers should parse + the standard `Categories` key, which contains a list of standardized + machine-readable categories with the possibility to add Apertis-specific + extensions, and translate those into the categories required by the + desired UX. +* `X-Apertis-CategoryIcon` (string): The short name of an icon for the + category, such as `icon_settings_AC`. In the short term, Canterbury + translates this to `/icon_settings_AC.png` to keep the current version + of Mildenhall-Launcher operational. Like `X-Apertis-CategoryLabel`, + this should be considered to be deprecated; instead, the launcher should + determine an icon name from the standard `Categories` key. +* `X-Apertis-BackgroundState` (string): What will happen to the + application when it is placed in the background: `running` (i.e. don't + kill), `stopped` (i.e. pause the process), `killed` (i.e. kill the + process). This key and its values should ideally be replaced with + something that more obviously describes an action rather than a state, + such as `kill`, `pause`, `continue`. +* `X-Apertis-DataExchangeRules` (string): This appears to be something to + do with Didcot, but its semantics are unclear. The only known example is + `default-data`. It should be clarified or dropped. +* `X-Apertis-ManifestUrl` (string): This appears to be intended to point + to the JSON manifest for the app bundle, but in the majority of the apps + that are currently implemented, it points to a nonexistent XML file, or + to the GSettings schema in which it is defined. It should be clarified + or dropped. +* `X-Apertis-SplashScreen` (string): None of the current app bundles have + this, and it is unclear what its value is meant to be. It is currently + passed to the compositor via a D-Bus method call. + +#### Transitional considerations + +In addition to `/var/lib/apertis_extensions/applications`, +Canterbury reads store app bundles' entry points +from `/var/lib/MILDENHALL_extensions/applications` and +`/var/lib/SAC_extensions/applications`, which are two older names for the +same thing. We should remove that feature when everything has migrated to +`/var/lib/apertis_extensions/applications`. + +Canterbury currently has special handling for the executable's arguments: + +* An argument named exactly `url` is assumed to be followed + by a placeholder; that placeholder is replaced by the + actual URL if the application is to be launched with a + URL argument. In the short term, this will be preserved. In + the longer term, Canterbury and applications should migrate to + [the standard `%u`, `%f`, `%U`, `%F` placeholders][Desktop Entry + placeholders] for a URL, filename, list of URLs or list of filenames + respectively. +* An argument named exactly `app-name` is assumed to be followed by a + placeholder; that placeholder is replaced by the *entry point ID*. In the short + term, this will be preserved. In the longer term, this should be dropped; + applications should know their own entry point IDs. +* An argument named exactly `play-mode` is assumed to be + followed by a placeholder; that placeholder is replaced by + `play` or `stop`. In the short term, this wil be preserved. In + the longer term, media player applications should implement + [Desktop Entry actions] instead. + +There is currently special handling for several arguments with value +exactly `### UNKNOWN ###`. In the long term this should be removed. + +In the long term, the category should be replaced by the standard +`Categories` key, preferably with values chosen from the [XDG Desktop +Menu specification][Desktop Menu categories]. This would allow for +variants that do not use precisely the same taxonomy of applications as +mildenhall-launcher; because `Categories` is a list, the launcher may +use fine-grained categories if desired, falling back to more general +top-level categories such as `AudioVideo` if it does not understand any +more specific category. + +The application launcher HMI should translate these categories into +whatever was specified by the variant's UX designer; for example, +mildenhall-launcher would translate `Video` to "Video & TV", `Office` +to "Productivity", and `Maps` to "Travel". The application launcher HMI +should also be responsible for presentational logic such as displaying +"Travel" as "T R A V E L" if desired. + +#### Features with no direct replacement + +`env-key-value-pair` in the GSettings schemata does not currently appear +to be used. We recommend removing this feature: application bundles +should normally be written to not need a special environment. If they +do need special environment variables, the desktop file could specify +a shell script as its `Exec` program, with that shell script setting +appropriate environment variables and then `exec`ing the real binary. + +`tile-thumbnails` in the GSettings schemata does not currently appear to +be used. A replacement can be added when the requirements are more clear. + +### Simple applications (one entry point) + +This is the simple case where an entry point has one "view", for example +the Rhayader web browser. + +We install symlinks in `/usr/share/applications` (for built-in +app bundles) or `/var/lib/apertis_extensions/applications` +(for store app bundles) pointing to the real file in +`{/usr,}/Applications/${bundle_id}/share/applications`, with content +similar to this. + +--- +# /usr/share/applications/org.apertis.Rhayader.desktop +[Desktop Entry] +Type=Application +Name=Rhayader +GenericName=Browser +X-GNOME-FullName=Rhayader Browser +Exec=/usr/Applications/org.apertis.Rhayader/bin/rhayader %U +Path=/usr/Applications/org.apertis.Rhayader +X-Apertis-Type=application +X-Apertis-InternetPriority=normal +Categories=Network;WebBrowser; +MimeType=text/html;x-scheme-handler/http;x-scheme-handler/https; +Icon=applications-internet +--- + +### Services + +Services are the same as applications (in particular, they have +`Type=Application`), except for these special cases: + +* they have `NoDisplay=true` to hide them from the menus +* the `X-Apertis-Type` is `service` or `agent-service` + +### Entry points which do not appear in the menus + +Some bundles might have an entry point that exists only to be started +as a side-effect of other operations, for instance to [handle URIs +and content-types][Content hand-over]. Those entry points would have +`NoDisplay=true` to hide them from the menus; that is the only difference. + +### Multiple-view applications + +Some bundles have more than one entry in the system menus; the example we +know about is Frampton. We propose to represent these with one `.desktop` +file per menu entry. + +In this model, each menu entry is a `.desktop` file. Frampton +would install `org.apertis.Frampton.Artists.desktop`, +`org.apertis.Frampton.Songs.desktop` and +`org.apertis.Frampton.Albums.desktop`. In addition, it would install +`org.apertis.Frampton.desktop` with `NoDisplay=true`. + +The running instance of Frampton would always identify itself as +`org.apertis.Frampton`, and the other three `.desktop` files use +`X-Apertis-ParentEntry=org.apertis.Frampton` to link them to that name. + +When using [D-Bus activation][Desktop Entry D-Bus +Activation] for applications (which is recommended), Frampton +would have separate D-Bus `.service` files for all four names, would +take all four bus names and their corresponding object paths at runtime, +and would export the `org.freedesktop.Application` API at all four paths; +but all of them would have `SystemdService=org.apertis.Frampton.service` +to ensure that only one activation occurs. The `Activate`, `Open` or +`ActivateAction` method on each bus name would open the relevant view. + +The result would look something like this: + +--- +# org.apertis.Frampton.desktop +[Desktop Entry] +Type=Application +Name=Frampton +GenericName=Audio Player +X-GNOME-FullName=Frampton Audio Player +Exec=/usr/Applications/org.apertis.Frampton/bin/frampton %F +Path=/usr/Applications/org.apertis.Frampton +X-Apertis-Type=application +Categories=Audio;Player;Music; +MimeType=audio/mpeg; +NoDisplay=true; +Icon=music +X-Apertis-ServiceExec=/usr/Applications/org.apertis.Frampton/bin/frampton --gapplication-service +--- + +--- +# org.apertis.Frampton.Artists.desktop +[Desktop Entry] +Type=Application +Name=Frampton — Artists +GenericName=Artists +Exec=/usr/Applications/org.apertis.Frampton/bin/frampton --artists +Path=/usr/Applications/org.apertis.Frampton +X-Apertis-Type=application +Categories=Audio;Player;Music; +Icon=music-artist +X-Apertis-ParentEntry=org.apertis.Frampton +--- + +--- +# org.apertis.Frampton.Albums.desktop +[Desktop Entry] +Type=Application +Name=Frampton — Albums +GenericName=Albums +Exec=/usr/Applications/org.apertis.Frampton/bin/frampton --albums +Path=/usr/Applications/org.apertis.Frampton +X-Apertis-Type=application +Categories=Audio;Player;Music; +Icon=music-album +X-Apertis-ParentEntry=org.apertis.Frampton +--- + +--- +# org.apertis.Frampton.Songs.desktop +[Desktop Entry] +Type=Application +Name=Frampton — Songs +GenericName=Songs +Exec=/usr/Applications/org.apertis.Frampton/bin/frampton --songs +Path=/usr/Applications/org.apertis.Frampton +X-Apertis-Type=application +Categories=Audio;Player;Music; +Icon=music-track +X-Apertis-ParentEntry=org.apertis.Frampton +--- + +## Appendix: GSettingsSchema-based entry point registration prior to October 2015 + +As of early October 2015, Canterbury uses GSettings schemata for entry +point registration. This is not an intended use of GSettings — the +existence of an entry point is not a setting — and it should be avoided. + +Canterbury reads the schemata from the default system paths, and from +a configurable path (`p_app_manager_get_store_apps_schema_path()`) +which in practice resolves to `/Applications/System/registry`. For +each schema in the path, if the name does not start with either +`com.app` (this prefix is actually configurable, `project-domain`) or +`org.secure_automotive_cloud.service`, then the schema is ignored. + +*Proposed replacement: Canterbury reads desktop files from at least +`/var/lib/apertis_extensions/applications` and `/usr/share/applications`, +and may read additional locations if desired. This should be +done by setting Canterbury's `XDG_DATA_DIRS` to include at least +`/var/lib/apertis_extensions` and `/usr/share`.* + +Canterbury reads the following keys, each with a corresponding constant +such as `APP_NAME` except where noted: + +* `app-name` (`pAppName`): A string: entry point ID, such as `frampton`, + `Frampton-Agent`. *Proposed replacement: the name of the `.desktop` file.* +* `background-state` (`uinBkgState`): One of { `running`, `stopped`, + `killed`, `unknown` }. *Proposed replacement: `X-Apertis-BackgroundState`* +* `working-directory` (`pWorkingDirectory`): A string: the app's + initial working directory, which in practice must be in its directory + `/usr/Applications/xyz` for built-in apps (Ribchester assumes this, + and uses it to create the app's storage during first-boot). *Proposed + replacement: the standard `Path` key.* +* `exec-path` (`pExecutablePath`): A string: the executable. *Proposed + replacement: the first word of the standard `Exec` key.* +* `exec-type` (`uinExecutableType`): One of { `application`, `service`, + `ext-application` (or sometimes `ext-app`, depending on project), + `agent-service`, `unknown` }. *Proposed replacement: `X-Apertis-Type`.* +* `exec-args` (`pExecutableArgv`): An array of (string, string) pairs + which are flattened into a single list for `exec()`, for example + `[('app-name', 'AudioPlayer'), ('menu-entry', 'A R T I S T S'), ('url', ' ')]` + turns into executing the equivalent of the Python code + `subprocess.call(['/usr/Applications/frampton/bin/frampton', 'app-name', 'AudioPlayer', 'menu-entry', 'A R T I S T S', 'url', ' '])`. + *Proposed replacement: the standard `Exec` key, except for its first word.* +* `internet-bw-prio` (`uinInternetBandwidthPriority`): One of { `highest`, + `high`, `mid`, `low`, `lowest`, `unknown` } or unspecified. *Proposed + replacement: `X-Apertis-BandwidthPriority`. Additionally, we recommend + accepting `normal` as a synonym for `mid`.* +* `splash-screen` (`pSplashScreen`): A string. No application specifies + this, so we do not know what its purpose is. *Proposed replacement: + `X-Apertis-SplashScreen`, or remove the feature.* +* `audio-resource-type` (`CANTERBURY_AUDIO_RESOURCE_TYPE`, + `uinAudioResourceType`): A `CanterburyAudioType`. *Proposed replacement: + use PulseAudio stream roles.* +* `audio-channel-name` (`pAudioChannelName`): A string: the name of the audio + channel. *Proposed replacement: make the audio manager derive the bundle ID + from the AppArmor profile in a way that cannot be faked by a malicious + app-bundle.* +* `audio-resource-owner` (`CANTERBURY_AUDIO_RESOURCE_OWNER`, + `pAudioResourceOwner`): A string: the entry point ID of the entry point that + will generate audio on behalf of this HMI. *Proposed replacement: make the + audio manager derive the bundle ID from the AppArmor profile in a way that + cannot be faked by a malicious app-bundle.* +* `category` (`pCategory`): A string: the displayed name of the + category. *Proposed replacement: `X-Apertis-CategoryLabel` in the short + term, `Categories` in the longer term.* +* `category-icon` (`pCategoryIcon`): A string: the name of the category + icon, of the form `/icon.png`, which appears to be relative to the + launcher's data directory. *Proposed replacement: `Icon`, changing + the value to be defined to be found via the freedesktop.org icon theme + specification.* +* `env-key-value-pair` (`ENV_KEY_VALUE`, `pEnvKeyValuePair`): An array + of strings, of even length: the environment of the subprocess. *Proposed + replacement: remove.* +* `window-name` (`APP_WIN_NAME`, `pAppWinName`): A string: the name + of the window that this HMI is expected to map. *Proposed replacement: + make the compositor derive the bundle ID from the AppArmor profile in a way + that cannot be faked by a malicious app-bundle.* +* `application-entry-names` (`APP_ENTRY_NAME_LIST`, + `pApplicationEntryName`): An array of strings. Each one is the title + of a quick-menu (right panel) entry in `mildenhall-launcher`. The + main-menu (left panel) entry is taken from `category` and + `category-icon`. *Implemented replacement: one desktop file per entry + point, and use its `X-GNOME-FullName`, `GenericName` and/or `Name`. The + ability to have more than one menu entry per application is replaced by + `X-Apertis-ParentEntry`.* +* `application-entry-icons` (`APP_ENTRY_ICON_LIST`, + `pApplicationEntryIcon`): An array of strings: `file:///` URLs to icons, + in the same order as `application-entry-names`. *Implemented replacement: + `Icon` may name either an icon in the icon theme, or a `file:///` URL.* +* `tile-thumbnails` (`APP_ENTRY_TILE_LIST`, `pApplicationTileThumbnail`): + An array of strings that represent home-screen tiles in some unspecified + way; we do not have any examples to use for reference. *Proposed + replacement: behave as though no application has a home-screen tile, + and design home-screen tiles separately.* +* `manifest-url` (`MANIFEST_FILE_URL`, `pAppManifestUrl`): A + string representing the manifest in some way. *Proposed replacement: + `X-Apertis-ManifestUrl`, or remove; services should find desktop files + for entry points in the standard way, and should find manifests for + app-bundles by looking in well-known locations for files whose names + are based on the bundle ID.* +* `app-settings-icon` (`pAppSettingsIcon`): A string representing an icon + used for settings. *Proposed replacement: icon from [application bundle + metadata].* +* `app-settings-name` (`pAppSettingsName`): A string representing a label + used for settings? *Proposed replacement: name from [application bundle + metadata].* +* `app-settings-path` (`pAppSettingsPath`): A string representing + a GSettings hierarchy used for settings. *Implemented replacement: + the settings schema (if any) whose name matches the bundle ID appears in + the system preferences UI; other schemas do not appear.* +* `mime-type` (`pMimeType`): An array of strings representing + content-types that this application can handle, and/or + pseudo-content-types such as `mt_app_settings`. *Implemented replacement: + `MimeType` for content-type and URL-scheme handlers; `Interfaces` to + discover other functionality.* +* `mime-list` (`pMimeList`): An array of strings representing some facet + of content type handling, with values such as `url`, `audio/mpeg` and + `launch`. *Proposed replacement: discover feature support with `Interfaces`.* +* `data-exchange-rules` (`DATA_EXCHANGE_FILE`, `pDataExchangeFile`): + A string which has something to do with data exchange, with the only + known value being `default-data`. *Proposed replacement: none.* +* `supports-global-search` (`SUPPORT_GLOBAL_SEARCH`, + `bSupportsGlobalSearch`): A boolean value indicating support for acting as + a global search provider. *Proposed replacement: if this would have been + true, then `org.apertis.GlobalSearchProvider` appears in `Interfaces`.* + +## Appendix: other approaches to multiple-view applications + +We considered some other approaches to this feature. + +### One `Desktop Action` per view + +In this model, each entry point (application or service) is a `.desktop` +file. Frampton would install `org.apertis.Frampton.desktop`, with contents +something like this: + +--- +# org.apertis.Frampton.desktop +[Desktop Entry] +Type=Application +Name=Frampton +GenericName=Audio Player +X-GNOME-FullName=Frampton Audio Player +Exec=/usr/Applications/org.apertis.Frampton/bin/frampton %F +Path=/usr/Applications/org.apertis.Frampton +X-Apertis-Type=application +X-Apertis-AudioRole=music +X-Apertis-AudioChannelName=org.apertis.Frampton.Agent +Categories=Audio;Player;Music; +MimeType=audio/mpeg; +NoDisplay=true; +Actions=albums;artists;songs; +Icon=music + +[Desktop Action artists] +Name=Artists +Icon=music-artist +Exec=/usr/Applications/org.apertis.Frampton/bin/frampton --artists +X-Apertis-ShowInMenu=true + +[Desktop Action albums] +Name=Albums +Icon=music-album +Exec=/usr/Applications/org.apertis.Frampton/bin/frampton --albums +X-Apertis-ShowInMenu=true + +[Desktop Action songs] +Name=Songs +Icon=music-track +Exec=/usr/Applications/org.apertis.Frampton/bin/frampton --songs +X-Apertis-ShowInMenu=true + +# this is *not* a "quick menu" entry +[Desktop Action shuffle] +Name=Shuffle All +Icon=music-shuffle +Exec=/usr/Applications/org.apertis.Frampton/bin/frampton-control --shuffle-all +--- + +The Desktop Entry Specification specifies that application launchers should +present [desktop actions][Desktop Entry actions] to the user within +the context of an application, for instance as a submenu, but that +isn't how the UX of `mildenhall-launcher` works. We therefore use +`X-Apertis-ShowInMenu` to indicate that these particular desktop +actions should be made available to the user even though their parent +`org.apertis.Frampton` is not. + +This could be combined with desktop actions as specified in the Desktop +Entry Specification if desired; those desktop actions would simply omit +`X-Apertis-ShowInMenu`. For example, if it was desirable for a long +press on Frampton's menu entries to result in a menu of actions such as +"shuffle all", "import from USB drive", "buy music", then those could +be represented as desktop actions. + +### One Apertis-specific menu entry per view + +This model is similar to the one with desktop actions, but it acknowledges +that desktop actions were not really designed to work that way, and uses +Apertis-specific syntax inspired by desktop actions instead: + +--- +# org.apertis.Frampton.desktop +[Desktop Entry] +Type=Application +Name=Frampton +GenericName=Audio Player +X-GNOME-FullName=Frampton Audio Player +Exec=/usr/Applications/org.apertis.Frampton/bin/frampton %F +Path=/usr/Applications/org.apertis.Frampton +X-Apertis-Type=application +X-Apertis-AudioRole=music +X-Apertis-AudioChannelName=org.apertis.Frampton.Agent +Categories=Audio;Player;Music; +MimeType=audio/mpeg; +X-Apertis-MenuEntries=albums;artists;songs; +Icon=music + +[Apertis Menu Entry artists] +Name=Frampton — Artists +GenericName=Artists +Icon=music-artist +Exec=/usr/Applications/org.apertis.Frampton/bin/frampton --artists + +[Apertis Menu Entry albums] +Name=Frampton — Albums +GenericName=Albums +Icon=music-album +Exec=/usr/Applications/org.apertis.Frampton/bin/frampton --albums + +[Apertis Menu Entry songs] +Name=Frampton — Songs +GenericName=Songs +Icon=music-track +Exec=/usr/Applications/org.apertis.Frampton/bin/frampton --songs + +[Desktop Action shuffle] +Name=Shuffle All +Icon=music-shuffle +Exec=/usr/Applications/org.apertis.Frampton/bin/frampton-control --shuffle-all +--- + +<!-- Other documents --> + +[Content hand-over]: https://wiki.apertis.org/Content_hand-over +[Interface discovery]: https://wiki.apertis.org/Interface_discovery + +[Apertis Application Bundle Specification]: https://appdev.apertis.org/documentation/bundle-spec.html +[Applications design document]: applications.md +[Application Entry Points]: https://wiki.apertis.org/Application_Entry_Points +[Application bundle metadata]: application-bundle-metadata.md +[App store approval]: https://wiki.apertis.org/App_Store_Approval +[Multimedia design]: multimedia.md +[Multimedia design document]: multimedia.md +[Multi-user design]: multiuser.md +[Multi-user design document]: multiuser.md +[Permissions]: applications.md#permissions +[Preferences and Persistence design document]: preferences-and-persistence.md +[System Update and Rollback design]: system-updates-and-rollback.md +[System Update and Rollback design document]: system-updates-and-rollback.md +[Security design]: security.md +[Security design document]: security.md + +<!-- Glossary --> + +[Application bundle]: https://wiki.apertis.org/Glossary#application-bundle +[Application bundles]: https://wiki.apertis.org/Glossary#application-bundles +[Graphical program]: https://wiki.apertis.org/Glossary#graphical-program +[Graphical programs]: https://wiki.apertis.org/Glossary#graphical-program +[System service]: https://wiki.apertis.org/Glossary#system-service +[System services]: https://wiki.apertis.org/Glossary#system-service +[User service]: https://wiki.apertis.org/Glossary#user-service +[User services]: https://wiki.apertis.org/Glossary#user-service + +<!-- Other links --> + +[Desktop Entry Specification]: http://standards.freedesktop.org/desktop-entry-spec/desktop-entry-spec-latest.html +[Desktop Entry actions]: http://standards.freedesktop.org/desktop-entry-spec/desktop-entry-spec-latest.html#extra-actions +[Desktop Entry D-Bus activation]: http://standards.freedesktop.org/desktop-entry-spec/desktop-entry-spec-latest.html#dbus +[Desktop Entry placeholders]: http://standards.freedesktop.org/desktop-entry-spec/latest/ar01s06.html#exec-variables +[Desktop Menu categories]: http://standards.freedesktop.org/menu-spec/latest/apa.html +[Frampton multiple entry points]: https://gitlab.apertis.org/appfw/frampton/tree/v0.6.1/scripts +[GDesktopAppInfo]: https://git.gnome.org/browse/glib/tree/gio/gdesktopappinfo.c#n1997 +[localestring]: https://specifications.freedesktop.org/desktop-entry-spec/desktop-entry-spec-latest.html#localized-keys +[PulseAudio media roles]: http://www.freedesktop.org/wiki/Software/PulseAudio/Documentation/Developer/Clients/ApplicationProperties/ +[Reversed domain name]: https://en.wikipedia.org/wiki/Reverse_domain_name_notation diff --git a/content/designs/application-framework.md b/content/designs/application-framework.md new file mode 100644 index 0000000000000000000000000000000000000000..bc7957b678a8c2a61b7cbb1274582ff6e6d68d5f --- /dev/null +++ b/content/designs/application-framework.md @@ -0,0 +1,714 @@ +--- +title: Application Framework +short-description: Ecosystem, Security, Compositor, Audio Management, Agents, + Flatpak, and much more +authors: + - name: Peter Senna Tschudin + - name: Corentin Noël + - name: Emanuele Aina +--- + +# The next-gen Apertis application framework + +As a platform, Apertis needs a vibrant ecosystem to thrive, and one of the +foundations of such ecosystem is being friendly to application developers and +product teams. Product teams and application developers are more likely to +choose Apertis if it offers flows for building, shipping, and updating +applications that are convenient, cheap, and that require low maintenance. + +To reach that goal, a key guideline is to closely align to upstream solutions +that address those needs and integrate them into Apertis, to provide to +application authors a framework that is made of proven, stable, complete, and +well documented components. + +The cornerstone of this new approach is the adoption of Flatpak, the modern +application system already officially supported on [more than 20 Linux +distributions](https://flatpak.org/setup/), including Ubuntu, Fedora, Red Hat +Enterprise, Alpine, Arch, Debian, ChromeOS, and Raspian. + +The target audiences of this document are: +* for *Product Owners* and *Application Developers* this document describes how + the next-generation Apertis application framework creates a reliable platform + with convenient and low maintenance flows for building, deploying, and + updating applications; +* for *Apertis Developer* this document offers details about the concepts + behind the next-generation Apertis application framework and a high level + implementation plan. + +The goals of the next-generation Apertis application framework are: + +* employ state-of-the-art technologies +* track upstream solutions +* expand the potential application developers pool +* leverage existing OSS documentation, tooling and workflows +* reduce ongoing maintenance efforts + +The next-generation Apertis application framework is meant to provide a +superset of the features of the legacy application framework and base them +on proven upstream OSS components where possible. + +## Creating a vibrant ecosystem + +Successful platforms such as Android and iOS make the convenient availability +of applications a strategic tool for adding value to their platforms. + +To be able to build an adequate number of applications with acceptable quality, +the entire platform is designed around convenience for developing, building, +deploying, and updating applications. + +Given the relatively small scale of Apertis when compared to the Android and +iOS ecosystems, the best strategy is to align to the larger Linux ecosystem, +and Flatpak is the widely adopted solution to the previously listed challenges. + +However, what makes Flatpak particularly compelling for Apertis is that +Flatpak effectively creates a shared development ecosystem that crosses the +distribution boundaries: while the fact that being automatically able to run +any desktop Flatpak on Apertis is an amazing technological feat, the biggest +benefit for Apertis is that by joining the Flatpak ecosystem the skills +developers need to learn to develop applications for Apertis become the same +as the ones needed to write applications aimed at all the mainstream Linux +desktop distributions. This significantly expands the potential developer pool +for Apertis, and ensures that the easily available online documentation +and workflows to build applications for the main Linux desktop distributions +also automatically apply to building applications for Apertis itself. + +## The next-generation Apertis application framework + +The next-generation Apertis application framework is a set of technologies +bringing applications to the state-of-the-art of security and privacy +considerations. + +With the use of modern tools, the framework is meant to grant to the user +strict control over its data. Applications are meant to be run contained, +and can talk with each other and with the rest of the system only +using dedicated interfaces. + +The containment is designed to keep the applications on their restricted +environment and prevents to modify the base system in any way without being +explicitly granted to do so. + +Whenever possible, applications have to define upfront their requirements +to access privileged resources, be it to share files across application or +to get Internet access. It is up to the app store maintainers to review and +ensure that the requested access is sensible before it reaches final users. +For other more dynamic privileged resources, authorization can be granted +at runtime thorugh explicit user interaction, usually via dedicated interfaces +called "portals". + +Flatpak provides those guarantees by using the kernel namespacing and control +groups subsystems to implement containers similarly as what Docker does. +Portals are then implemented as D-Bus interfaces that application can invoke +to request privileged actions from inside their sandbox. + +Access to the graphical session both to render the application contents and to +manage input from users is managed securely by a Wayland compositor. + +Audio policies are extremely important for Apertis, specially so in automotive +environments, and PipeWire provides an excellent foundation to handle those by +providing the tools to wire applications to the needed resources in a secure +and customizable way. + +Launching applications, agents, and other services happens through `systemd`, +which in charge to run both the system and the user sessions. Systemd provides +a [wide set of options to further secure +services](https://gist.github.com/ageis/f5595e59b1cddb1513d1b425a323db04), track +their resource consumption, ensure their availability, etc. + + + +## Application runtime: Flatpak + +Flatpak is a framework with the goal of letting developers to deploy and run +their applications on multiple Linux distributions with little effort. To do +so, it decouples the application from the base OS: this decoupling also +allow an application to be deployed with no changes on different variants +of the same base OS, different versions of the same base OS or even be deployed +alongside another application which need an incompatible set libraries. + +Decoupling the base OS from applications is particularly valuable for Apertis +since it allows applications to be deployed seamlessy over multiple variants +while minimizing the set of components shipped in the base OS. + +Another interesting effect of the decoupling is that the release cycles of +applications are no longer tied to the one of the base OS: while the latter +needs to go through a longer validation process, applications can release much +faster and in a completely independent way. + +**Applications as made by the developer** + +A Flatpak application is a self-contained application based on a runtime, +ensuring that the user runs the application the way it has been meant by the +developer without depending on what is currently installed on the user machine. + +**Secure by design** + +A Flatpak application run confined under a restrictive security sandbox. +Updates for the application can be done quickly and atomically or according to +any system-wide policy. As Flatpak is vendor-agnostic, it allows ensuring that +the applications are genuine by signing the applications and the source store. + +Flatpak at the moment does not support AppArmor to further confine applications. +Since Apertis makes heavy use of AppArmor to protect its service, we plan to +add AppArmor support to Flatpak to add another layer of defense to keep +applications confined and prevent them from doing unwanted changes to the base +operating system. + +**Privacy** + +Every application ship with a security profile that describes the requirements +of the application and explicit consent from the user is needed to get access +to any service not described by the security profile. + +**Integrated into the environment** + +Flatpak is providing the latest standards for building applications: using +reversed DNS domain name notation, AppStream and Desktop specifications from +FreeDesktop.org developers have a complete control over the metadata of their +applications and have the suitable tools to provide rich information describing +their application. + +**Efficient and lightweight** + +Flatpak is very efficient and doesn't require to spend time configuring a +heterogeneous set of tools to work on a system. With libostree at the heart of +Flatpak, cutting-edge technology is used to reduce its footprint by the use +content deduplication. The deduplication results in consuming less disk-space +and less network bandwidth. + +**Release at your own pace** + +Flatpak decouples applications from the underlying Operating System, +so that they can follow different release schedules minimizing the impact of +conflicting changes: applications in Flatpak rely on basic set of libraries +called "frameworks" that shield them from the actual libraries used by the OS. +OSTree helps to keep this redundancy under control, minimizing the storage +consumption by de-duplicating items in common. +Frameworks help to keep the base OS lean and minimal as non-core libraries +can be moved closer to the applications that need it, and thus development and +validation can happen faster. +On the application side, new versions of basic libraries can be used without +fearing regressions on other applications, reducing the time to market. + +## Compositor: libweston + +The compositor is the boundary between applications and the actual +human-machine interface: it is responsible of mediating access to the screen +and to the input devices, guaranteeing that each application only get the input +commands directed to it and can't read or interfere with the resources assigned +to other applications. + +The next-generation Apertis application framework continues to rely on the +Wayland protocol to let applications talk to the compositor in a secure, +efficient, and well-supported way. + +The compositor is meant to be agnostic of the UI toolkit applications use, +and by sticking to the commonly implemented Wayland interfaces it supports +the main OSS UI toolkits out of the box, even running at the same time, with +no custom code being required on the application side. + +While applications targeting the next-generation Apertis application framework +should work with any compliant Wayland compositor implementing the most common +extensions, Apertis plans to provide a reference compositor that aims to be +customizable for the different non-desktop use-cases targeted by Apertis. + +The main requirement for the reference compositor is to be based on +`libweston`, as this library is a valuable asset of reusable code for +compositors originating from the Weston project. + +A good starting point for the compositor reference implementation is to use the +[agl-compositor](https://gerrit.automotivelinux.org/gerrit/admin/repos/src/agl-compositor) +project because it was purposely built as a reference implementation. Ease of +coding was a design goal, and it is expected that both the client shell and the +compositor itself are easy to understand and modify. The code base is small, +trim, maintained and is currently evolving. + +Additional features includes support to clients using XDG shell protocol, and +an example of a compositor private extension that allows the client shell to +provide additional roles to surfaces. + +Another option for the reference compositor is the +[Maynard](https://gitlab.apertis.org/hmi/maynard) project. Unfortunately the +project is not currently maintained, and it's internal architecture is +outdated: it builds Weston plugins out of tree which was the recommended way +before libweston existed. The main issue of using Maynard is that because it is +not maintained upstream, we would need to maintain it ourselves. + +## Audio management: PipeWire and WirePlumber + +Applications should be able to play sounds and capture the user speech if they +desire to do it, but the system need to guarantee that: + +* applications cannot interfere with the audio streams of other applications; +* access to the audio captured by microphones is granted only on explicit + authorization by the user whenever possible; +* on a multi-zone setup like on some cars, sounds are emitted in the zone + where the application is displayed; +* important messages can be emitted in clear, audible way even if other + applications are already playing multimedia contents, by pausing the other + streams whenever possible or mixing the streams at different volumes. + +PipeWire is the current state-of-the-art solution for secure and efficient +audio routing. Applications can use it natively, from GStreamer, or via the +ALSA and PulseAudio compatibility layers, and it is designed to work well when +combined with the Flatpak sandboxing capabilities. + +Since PipeWire does not include any default policy engine, a separate component +is in charge of setting up the connections between the PipeWire nodes to ensure +that the system rules are enforced. The +[WirePlumber](https://gitlab.freedesktop.org/gkiagia/wireplumber) project from +AGL implements such policy service with goals and restrictions aligned to the +ones for Apertis. + +## Session management: systemd + +While not directly exposed to applications, session management is a fundamental +part of the application framework with the purpose of: + +* launching applications upon user request from the graphical launcher; +* running headless agents; +* activating session services needed by applications and agents; +* monitor the life-cycle of applications and services; +* enforce resource tracking on applications and services. + +The systemd user session system provides the currently most advanced solution +to the above problem space, with the Apertis legacy application framework +already making use of it and other mainstream environment like GNOME being in +the process of completely switching to systemd to manage their sessions. + +## Software distribution: hawkBit + +For software distribution use-cases Apertis supports Eclipse hawkBit, a domain +independent back-end framework for rolling out software updates to constrained +edge devices as well as more powerful controllers and gateways connected to IP +based networking infrastructure. This software distribution has to be enhanced +to gain flatpak support. + +With Flatpak, bundle repositories can be created and configured as needed, +and a single system can fetch applications from multiple repositories at the +same time. + +Apertis will offer a reference instance where application can be shared +and made available to all the Apertis users, to foster collaboration and to +provide a rich set of readily available applications. + +Downstreams and product teams can set up their own instance to publish +applications intended for a more limited audience. + +The Apertis reference store also builds on top of the Apertis GitLab code +hosting services to define a reproducible Continuous Integration workflow to +automatically build applications from source and publish them to the app store. + +Once the quality assurance has validated a specific version of an application, +an easy way is provided to the developer to publish the Apertis hawkBit +instance. + +To ensure a good quality of service, and to be certain that the service matches +the expectations, Apertis core applications may themselves be shipped as Flatpak +bundles over the Apertis hawkBit instance. + +## Evaluation + +The next-generation application framework matches all the requirements that +have driven the development of the legacy application framework. + +In particular, in no way the next-generation application framework results in a +loss of functionality or features: it instead builds on top of mature, proven +technologies to expand what it is possible with the legacy framework, +adapting to the evolving state-of-the-art application ecosystem on Linux. + +The application framework is compliant with the current requirements of the +Apertis platform for [system services][System services], +[user services][User services], and [graphical programs][Graphical Programs]. +It relies heavily on the freedesktop.org specifications that specify where +applications can store their data with different guarantees, how their +metadata is to be encoded, and how they can best integrate with the system. + +Flatpak uses `libostree` to implement robust application updates and rollbacks, +efficiently using network bandwidth and local storage. Updates are signed +and the alternative signing mechanisms developed by Apertis for its system +updates can be used to avoid the GPL-3 issues related to the use of GnuPG. + +The requirement of having a security boundary between applications is +addressed by the use of the control group and namespacing kernel subsystems. +The use of AppArmor can be introduced to add another layer of defense +to the already strong secuirty provisions Flatpak offers. +Flatpak also let applications to be installed per-user, increasing the +separation on multi-user systems. + +Application data and settings are stored inside the application sandbox, +ensuring that they are stored securely, that they can be managed easily for +rollback purposes, and that applications are free to chose any mechanisms +to manage them. + +**App bundle contents** + +The Flatpak [application bundle contents](https://github.com/flatpak/flatpak/wiki/Filesystem) +is a well-defined application layout that largely matches the approach used +by the legacy application framework, improving over it in particular with the +introduction of "frameworks" as a way to decouple the application from the base +OS and yet retain efficiency in term of deploying updates affecting multiple +applications and in term of storage consumption. + +With the use of Flatpak frameworks any language runtime can be used easily by +applications even if the base OS does not ship it. + +**Data Management** + +Flatpak applications can use the [XDG Base Directory Specification][] to find +the appropriate places to store persistent private data that can't be accessed +by other applications, and temporary cache files that can be deleted by the +system to reclaim space. + +Policies for storage space reclaiming and rollback need to be defined and are +to be implemented in dedicated components. + +**Sandboxing and security** + +With the use of the control group and namespacing kernel subsystems, Flatpak +offers a state-of-the-art approach for containing applications to limit what +they can access on the system and to isolate them from each other. + +The integrity of the application data is guaranteed by the namespaced +application filesystem being mounted read-only, and thus being unmodifiable by +the application itself, and by using namespaces to limit the amount of data +each application can access. + +Applications can not see the other installed and running applications and +neither can modify them. They also can't communicate between each other without +user consent. + +**App pemissions** + +The [Flatpak pemissions][] system allows to declare in advance any needed +permissions to access sensitive resources like user data or special devices, +to be reviewed by app store curators. + +Additional runtime permissions to access data outside of what the +application normally need to use can be granted via explicit user actions, +usually via dedicated Flatpak portals. + +Integration with Flatpak portals to transparently grant applications privileged +access on explicit user actions is already available in the main application +toolkits like Qt, GTK, etc. + +**App launching** + +Each installed Flatpak application automatically exports its `.desktop` entry +point, in a way that any compliant application launcher can automatically +list and start the installed Flatpak applications. + +The applications themselves have to use the [Desktop Entry +Specification](https://standards.freedesktop.org/desktop-entry-spec/latest/) +to provide the required metadata and entry points. + +It is possible for applications to explicitely specify that they should not be +listed in the launcher, to avoid headless agents polluting the menu. + +**Document launching** + +Applications and entry points can specify the media types they handle using the +[MIME type handling +provisions](https://standards.freedesktop.org/desktop-entry-spec/latest/ar01s10.html) +from the [Desktop Entry +Specification](https://standards.freedesktop.org/desktop-entry-spec/latest/). +The application framework is responsible of making the selected document +visible to the associate application and run the application if it wasn't +previously running, or queue the queue the file opening on busy systems. + +**URI launching** + +With the special `x-scheme-handler` MIME type the same mechanism used for +*Document launching* can be used to handle specific URI schemes. +In case the URI scheme is a `file`, treat it as launching a local document. + +**Content selection** + +Flatpak provides portals to let users explicitly grant access to any of their +files without any upfront special permissions being granted to the application. +Integration with the file selection portals is already available in the most +widespread OSS application toolkits. + +**Data sharing** + +Flatpak applications can be granted special permissions to access D-Bus +services or filesystem subtrees that can be used to share data across +a set of applications. Flatpak also let applications to be activated on-demand +via D-Bus, which can be particularly useful for headless agents. + +**Life cycle management** + +Each Flatpak sandbox automatically contains all the application processes in a +secure and efficient way. The system user session management can add another +layer of control, tracking both application and system services with a +homogeneous approach. + +The compositor can track to which process and thus to which application or service +each window belongs to. + +**Last used context** + +Applications can store their last status in their private data area and have it +available on the next launch, enabling the implementation of the simplest +approach purely on the application side with no specific involvement of the +application framework. + +More advanced use cases that may require a deeper involvement of the +application framework needs to be evaluated. + +**Installation management** + +Flatpak allows applications to be installed system-wide or per-user, and provides +extensive tooling to retrieve contents from remote stores, list local applications, +and fetch updates. + +The use of OSTree to store application contents makes rolling them back simple +and efficient. Data is not usually rolled back when rolling back an +application: if use-cases require data rollback it needs to be implemented in +dedicated components. + +Flatpak also provides both efficient online and offline installation mechanisms. + +**Conditional access** + +Flatpak lets applications to be installed either system-wide, making them +available to every user, or per-user where only user that have explicitly +installed an application can access it. + +However, the latter means that storage is not de-duplicated. Advanced setups +may be defined to leverage the de-duplication capabilities of OSTree without +automatically sharing installed applications with every user of the system. + +**UI customization** + +One of the key values for Apertis is to be aligned with upstream, so the best +UI customization strategy is to rely on the upstream theming infrastructure +offered by toolkits like GTK. + +Flatpak can [inject system themes in the containerized +runtimes](https://blog.tingping.se/2017/05/11/flatpak-theming.html) to apply +a global theme without changing anything in the applications. + +## Focus on the development user experience + +Flatpak provides extensive tooling to give developers a working environment +that is easy to setup and use: the framework provides the necessary tools and +libraries for developers to create their application and is highly extensible. + +A key part of delivering the best developer experience is by promoting a default +Integrated Development Environment (IDE). As such, GNOME Builder is the current +best option for providing an efficient environment bringing Flatpak as a +first-class component, integrating with many languages, providing support for +Git versioning system, and available on any Linux distribution as it is itself +distributed as a Flatpak. + +As the framework is composed of a set of different tools interacting with each +other, it is also possible for the developer to use a classic developer workflow +and use the command line to build and install an application. Guaranteeing the +same result independently of the machine it is built on and thus allowing fully +reproducible builds. The framework itself is built upon existing technology, it +will benefit from the broadly available documentation and support of highly +heterogeneous build configuration that each application requires. + +Installing a flatpak application from Flathub only requires a single command, +here is an example with Goodvides, an internet radio player +application: +```shell +flatpak install flathub io.gitlab.Goodvibes +--- + +The application can then be run by clicking on the desktop icon or simply with: +```shell +flatpak run io.gitlab.Goodvibes +--- + +Each application can be defined using a [standard manifest][Manifest] that +describes all the dependencies, their source and how to build them. If a +dependency is not in the Apertis framework Runtime, it can be added by the +developer itself in the definition file. The libraries aren't shared with the +base system, allowing the developer to ship the version of the dependency that +matches the needs of the software and not needing to wait for it to be available +in the system itself. A set of tools is even available for the developer to +build a runtime using the same dependencies that are available on its machine. + +To illustrate the comprehensive coverage of flatpak regarding the developer +experience, here are the few steps to build the Goodvides application that we +previously mentioned: + + 1) Getting the manifest describing the dependencies from the original package + +```shell +flatpak run --command=cat io.gitlab.Goodvibes /app/manifest.json > io.gitlab.Goodvibes.json +--- + + 2) Build the flatpak locally, allowing to install the dependencies from flathub + if required + +```shell +flatpak-builder --install-deps-from=flathub build-dir io.gitlab.Goodvibes.yaml +--- + +That's it, the flatpak is now built + + 3) For testing the result, you can directly use + +```shell +flatpak-builder --run build-dir io.gitlab.Goodvibes.yaml goodvibes +--- + +# Legacy Apertis application framework + +Both the new and the legacy Apertis application frameworks are meant to be +available at the same time in Apertis as alternative options, the legacy +framework being shipped on the reference images in the short term. +As the new application framework matures, components will get swapped on the +reference images, leaving the legacy components available in the archive. + +See the [](canterbury-legacy-application-framework.md) for more details. + +# High level implementation plan for the next-generation Apertis application framework + +The transition to the new infrastructure can follow a process to keep the +legacy framework fully available during the whole process and ensure that it +still continue to work afterwards. Both frameworks will be in the Apertis +repositories as mutually exclusive options to be chosen by product teams based +on their needs. + +The new Apertis application framework integrates with the existing QA and +testing platform for Apertis. + +The implementation will be held whithin a few different axis that can be +developed in parallel and in the order that might make more sense at the time +of the implementation. + + + +## Flatpak on the Apertis images +The goal here is to ensure that all the Flatpak tools and services are working on +the reference Apertis images. + +1. Ensure that all the Flatpak tools are installed by default on the reference + Apertis images: + * target images have the tools needed to install, update, run, and remove + Flatpak applications + * SDK images also ship the tools needed to create Flatpak bundles +1. Test that a simple test application like GNOME Calculator can be installed + on the reference Apertis images, that it gets displayed normally and that + the user interaction is also working. +1. Test a more complex application like Goodvibes, ensure that the audio + playback is working. +1. Test more complex applications requiring GL rendering (for instance, + OpenArena), ensure that the open-source graphical rendering stack works. + Testing the proprietary graphical stack is out of scope as it does not + provide same levels of functionality and support when compared to the + open source stack. +1. Taking the needs of the product teams into consideration, go through the + list of official portals and ensure that they are functional. + +## The Apertis Flatpak application runtime +The goals here are to create a reference Flatpak runtime for Apertis +applications and move all the applications to Flatpak. + +To avoid bottlenecks, the Flatpak bundles produced in the steps described +here can be tested on any non-Apertis platform supporting Flatpak. + +1. Setup [Flatdeb] to automate the creation of Flatpak runtimes and + Flatpak applications from `.deb` packages using the GitLab Continuous + Integration pipelines. +1. Create a basic Apertis reference runtime aimed at headless agents and + without legacy component like Mildehall, built with [Flatdeb], similar to + the FreeDesktop.org SDK. +1. Create a guide for product teams to create their own applications and + runtimes using the Apertis tools. +1. Create a basic Flatpak runtime to run Mildenhall applications +1. Convert the sample-apps to Flatpak using the Mildenhall runtime, starting + from the simplest ones to the ones requiring the most interaction with the + system. Ensure that each porting process is documented. +1. Coalesce the documentation in a comprehensive guide to convert legacy + applications. +1. Convert more complex Mildenhall legacy applications like Frampton. +1. Create a legacy-free Apertis reference runtime for GUI applications. +1. Investigate more modern alternatives to the Mildenhall legacy demo + applications and base them on the legacy-free Apertis reference runtimes. + +## Implement a new reference graphical shell/compositor +This section is about deploying a new graphical shell based on modern +components and avoiding deprecated libraries like Clutter. + +1. Begin with a new minimal shell based on the Weston Wayland compositor + and make it available on the reference images, to be enabled optionally. +1. Ensure that legacy Mildenhall applications work properly under the new + compositor. +1. Progressively add features like notifications and an application drawer to + discover and launch applications. +1. Switch the default compositor from the legacy Mildenhall-Compositor to the + new one. +1. Iteratively improve the look and feel of the shell. +1. Document how the shell can be customized or replaced by product teams + while fully re-using the Weston core compositor implementation. + +## Switch to PipeWire for audio management +The steps described here are about making audio management more secure and +flexible on Apertis. + +1. Update the [Apertis audio management] design document to describe the + different approach using [PipeWire] instead of PulseAudio. +1. Start the work using a basic policy with [WirePlumber] from AGL. +1. Ensure that audio capture is functional using a simple audio player + application. +1. Ensure that video capture is functional using a simple camera viewer + application. +1. Ensure that audio playback is functional without PulseAudio, but still + default to PulseAudio for audio playback. +1. Ensure compatibility with applications using the PulseAudio client + libraries to provide a smooth migration. +1. Switch the default for audio playback to PipeWire. +1. Progressively refine policies and introduce stream priority handling. +1. Provide a guide for product teams about customizing the audio + management policies. + +## AppArmor support +This section focuses on using AppArmor as an additional level of security +to constrain applications. + +1. Add a basic AppArmor profile setup to Flatpak to ensure each application + runs with its dedicated profile. +1. Progressively make the application profile more strict. +1. Customize the AppArmor profile based on the application permissions + described in its manifest. + +## The app-store +For the user-driven use-case it is key to demonstrate a full workflow that +includes an application store. + +The store and the deployment management service are kept separate: +* the store is the front-end for the user and is the commercial layer of the + system (payments, etc.); +* the deployment management service manages the actual installation of the + software on the device based on the state of the store, but also dealing with + updates that do not go through the store. + +1. Improve the reliability of the Apertis hawkBit instance. +1. Plug the Apertis hawkBit instance authentication system to the Apertis + user database. +1. Extend the application building pipelines to push Apertis apps to hawkBit. +1. Extend the hawkBit agent to manage Flatpak applications. +1. Create and deploy a simple front-end store for applications, extending an + existing e-commerce platform or adopting hawkBit-based solutions + like the [Kuksa Appstore](https://github.com/eclipse/kuksa.cloud/tree/master/kuksa-appstore). +1. Ensure that the whole app-store workflow is documented and functional to + handle user-driven installations and updates via hawkBit. +1. Extend the hawkBit agent and other tools to handle the + [conditional access](https://wiki.apertis.org/Conditional_Access) use cases. +1. Provide a guide for product teams about deploying their own app-store. + +[System services]: https://wiki.apertis.org/Glossary#system-service +[User services]: https://wiki.apertis.org/Glossary#user-service +[Graphical Programs]: https://wiki.apertis.org/Glossary#graphical-program +[Manifest]: http://docs.flatpak.org/en/latest/manifests.html +[XDG Base Directory Specification]: https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html +[Flatpak pemissions]: http://docs.flatpak.org/en/latest/sandbox-permissions.html +[Flatdeb]: https://gitlab.collabora.com/smcv/flatdeb +[PipeWire]: https://pipewire.org +[WirePlumber]: https://gitlab.freedesktop.org/gkiagia/wireplumber +[conditional access]: https://wiki.apertis.org/Conditional_Access +[Apertis audio management]: https://designs.apertis.org/latest/audio-management.html diff --git a/content/designs/application-layout.md b/content/designs/application-layout.md new file mode 100644 index 0000000000000000000000000000000000000000..7223378671f5dd33cdbe896ba890cdbd69c9856e --- /dev/null +++ b/content/designs/application-layout.md @@ -0,0 +1,1355 @@ +--- +title: Application layout +short-description: Layout of files and directories inside an app bundle +authors: + - name: Simon McVittie +--- + +# Application layout + +Application bundles in the Apertis system may require several categories +of storage, and to be able to write correct AppArmor profiles, we need +to be able to restrict each of those categories of storage to a known +directory. + +This document is intended to update and partially supersede +discussions of storage locations in the [](applications.md) and +[](system-updates-and-rollback.md) design documents. + +The [Apertis Application Bundle Specification] describes the files +that can appear in an application bundle and are expected to +remain supported long-term. This document provides rationale for those +categories of files, suggested future directions, and details of +functionality that is not necessarily long-term stable. + +## Requirements + +### Static files + + * Most application bundles will contain one or more executable + [programs][program], in the form of either compiled machine code or + scripts. These are read-only and executable, and are updated when the + bundle is updated (and at no other time). + + * Some of these programs are designed to be run directly by a + user. These are traditionally installed in `/usr/bin` on Unix + systems. Other programs are *supporting programs*, designed to be + run internally by programs or libraries. These are traditionally + installed in `/usr/libexec` (or sometimes `/usr/lib`) on Unix + systems. Apertis does not require a technical distinction between + these categories of program, but it would be convenient for them + to be installed in a layout similar to the traditional one. + +* Application bundles that contain compiled executables may contain + *private shared libraries*, in addition to those provided by the + [platform], to support the executable. These are read-only ELF shared + libraries, and are updated when the bundle is updated. + * For example, [Frampton] has a private shared library + [libframptonagentiface] containing GDBus interfaces. +* Application bundles may contain dynamically-loaded *plugins* (also + known as loadable modules). These are also read-only ELF shared libraries. +* Application bundles may contain static *resource files* such as + `.gresource` resource bundles, icons, fonts, or sample content. This + are read-only, and are updated when the bundle is updated. + * Where possible, application bundles should + [embed resources in the executable or library using GResource][GResource]. + However, + there are some situations in which this might not be possible, + which will result in storing resource files in the filesystem: + * if the application will load the resource via an API that is + not compatible with GResource, but requires a real file + * if the resource is extremely large + * if the resource will be read by other programs, such as the + icon that will be used by the app-launcher, the .desktop file + describing an entry point (used by Canterbury, Didcot etc.), + or D-Bus service files (used by dbus-daemon) + * If a separate `.gresource` file is used, for example for programs + written in JavaScript or Python, then that file needs to be stored + somewhere. +* The AppArmor profile for an application bundle must allow that + application bundle to read, mmap and execute its own static files. +* The AppArmor profile for an application bundle must not allow that + application bundle to *write* its own static files, because they are + meant to be static. In particular, the AppArmor profile itself must not + be modifiable. + +### Variable files + +* The programs in application bundles may save variable data + (configuration, state and/or cached files) for each [user]. ([Applications + design] §3, "Data Storage") + * *Configuration* is any setting or preference for which there is a + reasonable default value. If configuration is deleted, the expected + result is that the user is annoyed by the preference being reset, + but nothing important has been lost. + * *Cached files* are files that have a canonical version stored + elsewhere, and so can be deleted at any time without any effect, + other than performance, resource usage, or limited functionality + in the absence of an Internet connection. For example, a client + for "tile map" services like Google Maps or OpenStreetMap should + store map tiles in its cache directory. If cached files are deleted, + the expected result is that the system is slower or less featureful + until an automated process can refill the cache. + * Non-configuration, non-cache data includes documents written by the + user, database-like content such as a contact list or address book, + license keys, and other unrecoverable data. It is usually considered + valuable to the user and should not be deleted, except on the user's + request. If non-configuration, non-cache data is unintentionally + deleted, the expected result is that the user will try to restore + it from a backup. +* The programs in application bundles may save variable data + (configuration, state and/or cached files) that are shared between all + [users][user]. ([Applications design] §3, "Data Storage") +* [Newport needs to be able to write downloaded files][bug 283] to a + designated directory owned by the application bundle. + * Because Newport is a platform service, its AppArmor profile will + need to be allowed to write to *all* apps' directories. + * Because downloads might contain private information, Newport must + download to a user- and bundle-specific location. +* The AppArmor profile for an application bundle must allow that + application bundle to read and write its own variable files. +* The AppArmor profile for an application bundle should not allow + that application bundle to execute its own variable files ("write + xor execute"), making a broad class of arbitrary-code-execution + vulnerabilities considerably more difficult to exploit. +* Large media files such as music and videos should normally be + shared between all [users][user] and all multimedia application + bundles. ([Multi-user design] §3, "Requirements") + +### Upgrade, rollback, reset and uninstall + +#### Store applications + +Suppose we have a [store application bundle], Shopping List version 23, +which stores each user's grocery list in a flat file. A new version 24 +becomes available; this version stores each user's grocery list in a +SQLite database. + +* Shopping List can be installed and upgraded. This must be relatively + rapid. +* Before upgrade from version 23 to version 24, the system should make + version 23 save its state and exit, terminating it forcibly if necessary, + so that processes from version 23 do not observe version 24 files or + any intermediate state, which would be likely to break their assumptions + and cause a crash. + * This matches the user experience seen on Android: graphical and + background processes from an upgraded `.apk` are terminated during + upgrade. +* Before upgrade from version 23 to version 24, the system must take a + copy (snapshot) of each user's data for this application bundle. +* After upgrade from version 23 to version 24, the current data will + still be in the version 23 format (a flat file). +* When a user runs version 24, the application bundle may convert the data + to version 24 format if desired. This is the application author's choice. +* If a user rolls back Shopping List from version 24 to version 23, + the system must restore the saved data from version 23 for each + user. ([Applications design] §4.1.5, "Store Applications — Roll-back") + * This is because the application author might have chosen to use + an incompatible format for version 24, as we have assumed here. + * For simplicity, we do not require a way for application authors + to avoid the data being rolled back. +* Shopping List can be uninstalled. This must be relatively + rapid. ([Applications design] §4.1.4, "Store Applications — Removal") +* When Shopping List is uninstalled from the system, the system must + remove all associated data, for all users. + * If a multi-user system emulates a per-user choice of apps by hiding + or showing apps separately on a per-user basis, it should delete + user data at the expected time: if user 1 "uninstalls" Shopping List, + but user 2 still wants it installed, the system may delete user 1's + data immediately. +* To save space, *cache files* (defined to mean files that can easily be + re-created, for example by downloading them) should not be included in + snapshots. Instead of being rolled back, these files should be deleted + during a rollback. ([System Update and Rollback design] §6.3, "Update + and Rollback Procedure") + +* **Unresolved:** [][Are downloads rolled back?] + +#### Built-in applications + +By definition, [built-in application bundles] are part of the same +filesystem image as the platform. They are upgraded and/or rolled back +with the platform. Suppose platform version 2 has a built-in application +bundle, Browser version 17. A new platform version 3 becomes available, +containing Browser version 18. + +* The platform can be upgraded. This does not need to be particularly + rapid: a platform upgrade is a major operation which requires rebooting, + etc. anyway. +* Before upgrade from version 2 to version 3, the system must take a copy + (snapshot) of each user's data for each built-in application bundle. +* Immediately after upgrade, the data is still in the format used by + Browser version 17. +* If the platform is rolled back from version 3 to version 2, the system + must restore the saved data from platform version 2 for every built-in + application, across all users. ([Applications design] §4.2.4, "Built-in + Applications — Rollback"; [System Update and Rollback design] §6.3, + "Update and Rollback Procedure") +* Uninstalling a built-in application bundle is not possible + ([Applications design] §4.2.3, "Built-in Applications — Removal") + but it should be possible to delete all of its variable data, with the + same practical result as if an equivalent store application bundle had + been uninstalled and immediately reinstalled. +* Cache files for built-in applications are treated the same as cache + files for [][Store applications], above. + +#### Global operations + +User accounts can be created and/or deleted. + +* Deleting a user account does not need to be as rapid as uninstalling + an application bundle. It should delete that user's per-user data in + all application bundles. + +A "data reset" operation affects the entire system. It clears everything. + +* A "data reset" does not need to be as rapid as uninstalling an + application bundle. It should delete all variable data in each application + bundle, and all variable data that is shared by application bundles. + +**Unresolved:** [][Does data reset uninstall apps?] + +### System extensions + +Bundles with sufficient [store curator approval][App Store Approval] +and permissions flags may install *system extensions* which will be +loaded automatically by platform components. The required permissions +may vary according to the type of system extension. For example, a +privileged system-wide systemd unit should be a "red flag" which is +normally only allowed in built-in applications, whereas a `.desktop` +file for a [menu entry][Application Entry Points] should normally be +allowed in store bundles, provided that its name matches the relevant +ISV's reversed domain name. + +#### Public system extensions + +Depending on the type of system extension, an extension might also be +intended to be loaded directly by store applications. For example, every +store application should normally load the current user interface theme, +and the set of icons associated with that theme (although each store +application bundle may augment these with its own private theming and +icon data if desired). We refer to extensions of this type as *public +system extensions*, analogous to the *public interfaces* defined by the +[Interface discovery] design. + +### Security and privacy considerations + +* Given an AppArmor profile name, it must be easy to determine (for + example via a library API provided by Canterbury) whether the program + is part of a built-in application bundle, a store application bundle, + or the platform. For application bundles, it must be easy to determine + the bundle ID. This is because the uid and the AppArmor profile name + are the only information available to services like Newport that receive + requests via D-Bus. +* Similarly, given a bundle ID and whether the program is part of a + built-in or store application, it must be easy to determine where it + may write. Again, this is for services like Newport. +* If existing open source software is included in an application bundle, + it may read configuration from `$prefix/etc` with the assumption that + this path is trusted. Accordingly, we should not normally allow writing to + `$prefix/etc`. +* The set of installed store application bundles is considered to be + confidential, therefore typical application bundles (with no special + permissions) must not be able to enumerate the entry points, systemd + units, D-Bus services, icons etc. provided by store application bundles. A + permission flag could be provided to make an exception to this rule, for + example for an application-launcher application like Android's Trebuchet. + * **Unresolved:** [][Are inactive themes visible to all?] +* **Unresolved:** [][Are built-in bundles visible to all?] + +### Miscellaneous + +* Directory names should be namespaced by [reversed domain names], so + that it is not a problem if two different vendors produce an app-bundle + with a generic name like "Navigation". +* Because we recommend the GNU Autotools (autoconf, automake, libtool), + the desired layout should be easy to arrange by using configure options + such as `--prefix`, in a way that can be standardized by build and + packaging tools. +* Where possible, functions in standard open-source libraries in our + stack, such as GLib, Gtk, Clutter should "do the right thing". For + example, `g_get_cache_dir()` should continue to be the correct function + to call to get a parent directory for an application's cache. +* Where possible, functions in other standard open-source libraries such + as Qt and SDL should generally also behave as we would want. This can be + achieved by making use of common Linux conventions such as the [XDG Base + Directory specification] where possible. However, these other libraries + are likely to have less strong integration with the Apertis platform + in general, so there may be pragmatic exceptions to this principle: + full compatibility with these libraries is a low priority. + +## Provisional recommendations + +The overall structure of these recommendations is believed to be valid, +but the exact paths used may be subject to change, depending on the +answers to the [][Unresolved design questions] and comparison with +containerization technologies such as [][Flatpak]. + +### Writing application bundles + +> Application bundle authors should refer to the +> [Apertis Application Bundle Specification] instead of this section. +> This section might describe functionality that is outdated or has not +> yet been implemented. + +#### Static data + +For system-wide static data, programs in application bundles should: + +* link against private shared libraries in the Automake `$libdir` or + `$pkglibdir` via the `DT_RPATH` (libtool will do this automatically) +* link against public shared libraries provided by the platform in the + compiler's default search path, without a `DT_RPATH` (again, libtool + will do this automatically) +* run executables from the platform, if required, using the normal + `$PATH` search +* run other executables from the same bundle using paths in the Automake + `$bindir`, `$libexecdir` or `$pkglibexecdir` +* load static data from the Automake `$datadir`, `$pkgdatadir`, + `$libdir` and/or `$pkglibdir` (using the data directories for + architecture-independent data, and the library directories for data that + may be architecture-specific) + * where possible, resource files should be embedded in the executable + or library using GResource; if that is not possible, they can be + included in a `.gresource` resource bundle in the `$datadir` or + `$pkgdatadir`; if that is not possible either, they can be ordinary + files in the `$datadir` or `$pkgdatadir` + * load plugins from the Automake `$pkglibdir` or a subdirectory +* install system extensions to the appropriate subdirectories of + `$datadir` and `$prefix/lib`, if used: + * `.desktop` files describing entry points (applications and agents) + in `$datadir/applications` + * D-Bus session services in `$datadir/dbus-1/services` + * D-Bus system services in `$datadir/dbus-1/system-services` + * systemd user units in `$prefix/lib/systemd/user` + * systemd system units in `$prefix/lib/systemd/system` + * icons in subdirectories of `$datadir/icons` according to the + [freedesktop.org Icon Theme Specification] + +All of these paths will be part of the application bundle. For store +applications, they will be installed, upgraded, rolled back and removed +as a unit. For built-in applications, all of these paths will be part +of the platform image. + +#### Icons and themes + +*This section might be split out into a separate design document as more +requirements become available.* + +Icons should be installed according to the [freedesktop.org Icon Theme +specification]. + +If an application bundle installs a general-purpose icon that +should represent an included application throughout the Apertis +system, it should be installed in the `hicolor` fallback theme, +i.e. `$datadir/icons/hicolor/$size/apps/$app_id.$format`, where `$size` +is either a pixel-size or `scalable`, and `$format` is `png` or `svg`. + +> The reserved icon theme name `hicolor` is used as the fallback whenever +> a specific theme does not have the required icon, as specified in the +> [freedesktop.org Icon Theme specification]. The name `hicolor` was chosen +> for historical reasons. + +If an application author knows about specific icon themes and wishes +to install additional icons styled to coordinate with those themes, +they may create `$datadir/icons/$theme_name/$size/apps/$app_id.$format` +for that purpose. This should not be done for themes where the desired +icon is simply a copy of the `hicolor` icon. + +*Rationale:* Suppose there is a popular theme named +`org.example.metallic`, and a popular application named +`com.example.ShoppingList`. If the author of Shopping List has +designed an icon that matches the metallic theme, we would like +the application launcher to use that icon. If not, the author of the +metallic theme might have included an icon in their theme that matches +this popular application; we would like to use that icon as our second +preference. Finally, if there is no metallic-styled icon available, +the launcher should use the application's theme-agnostic icon from the +`hicolor` fallback directory. We can achieve this result by placing +icons from each app bundle's `$datadir` in an early position in the +launcher's `XDG_DATA_DIRS`, and placing icons from the theme itself in +a later position in `XDG_DATA_DIRS`: the freedesktop Icon Theme lookup +algorithm will look for a metallic icon in all the directories listed in +`XDG_DATA_DIRS` before it falls back to the `hicolor` theme. + +The application may install additional icons representing actions, +file types, emoticons, status indications and so on into its +`$datadir/icons`. For example, a web browser might require an icon +representing "incognito mode", which is probably not present in all icon +themes. Similar to the application icon, the browser may install variants +of that icon for themes other than `hicolor`, if its author is aware of +particular themes and intends the icon to coordinate with those themes. + +**Unresolved:** [][Standard icon sizes?] + +#### Per-user, per-bundle data + +For *cached files* that are specific to the application and also +specific to a user, programs in application bundles may read and write the +directory given by `g_get_user_cache_dir()` or by the environment variable +`XDG_CACHE_HOME`. This location is kept intact during upgrades, but is not +included in the snapshot made during upgrade, so it is effectively emptied +during rollback. It is also removed by uninstallation or a data reset. + +For *configuration* that is specific to the application and also specific +to a user, the preferred API is the `GSettings` abstraction described in +the [Preferences and Persistence design document]. As an alternative to +that API, programs in application bundles may read and write the directory +given by `g_get_user_config_dir()`, or equivalently by the environment +variable `XDG_CONFIG_HOME`. This locations is kept intact and also backed +up during upgrades, restored to its old contents during a rollback, and +removed by uninstallation of the bundle, deletion of the user account, +or a data reset. + +For other variable data that is specific to the application and also +specific to a user, programs in application bundles may read and write +the directory given by `g_get_user_data_dir()`, or equivalently by the +environment variable `XDG_DATA_HOME`. This location has the same upgrade, +rollback and removal behaviours as `g_get_user_config_dir()`. Applications +may distinguish between configuration and other variable data, but we +do not anticipate that this will be necessary in Apertis. + +For downloads, programs in application bundles may read and write the +result of `g_get_user_special_dir (G_USER_DIRECTORY_DOWNLOADS)`. Each +application bundle may assume that it has a download directory per user, +shared by all separate from other users and other application bundles. The +download service, Newport, may also write to this location. Uninstalling +the application bundle or removing the user account causes the download +directory to be deleted. + +**Unresolved:** [][Are downloads rolled back?] + +#### Per-user, bundle-independent data + +For variable data that is shared between all applications but specific to +a user, programs in application bundles may read and write locations in +the user's subdirectory of `/home` if they have appropriate permissions +flags for their AppArmor profiles to allow it. We should restrict this +capability, because it may affect the behaviour of other applications. + +These locations should not be what is returned by `g_get_config_home()`, +because we want the default to be that app bundles are self-contained. We +could potentially provide a way to arrange for specific directories +to be symlinked or bind-mounted into the normally-app-specific +`g_get_user_config_dir()` and so on. + +These locations are not subject to upgrade or rollback, and are never +cleared or removed by uninstalling an app-bundle. They are cleared when +the user account is deleted, or when a data-reset is performed on the +entire device. + +**Unresolved:** [][How do bundles discover the per-user, bundle-independent location?] + +**Unresolved:** [][Is `g_get_home_dir()` bundle-independent?] + +#### User-independent, per-bundle data + +> As of Apertis 16.12, this feature has not yet been implemented. + +For variable data that is specific to the application but +shared between all users, programs in application bundles +may read and write `/var/Applications/$bundle_id/cache`, +`/var/Applications/$bundle_id/config` and/or +`/var/Applications/$bundle_id/data`. Convenience APIs to construct these +paths should be provided in libcanterbury. Ribchester should create and +chmod these directories if and only if the app has a permissions flag +saying it uses them, so that the system will deny access otherwise. + +These locations have the same upgrade and rollback behaviour as the +per-user, per-bundle data areas. They are deleted by a whole-device data +reset, but are not deleted if an individual user account is removed. + +#### Shared data + +For media files, programs in application bundles may read and write +the result of `g_get_user_special_dir (G_USER_DIRECTORY_MUSIC)` and/or +`g_get_user_special_dir (G_USER_DIRECTORY_VIDEOS)`. These locations are +shared between users and between bundles. The platform may deny access +to these locations to bundles that do not have a special permissions flag. + +For other variable data that is shared between all applications and all +users, programs in application bundles may read and write the result of +`g_get_user_special_dir (G_USER_DIRECTORY_PUBLIC_SHARE)`. The platform +may deny access to this location to bundles that do not have a special +permissions flag. This location is shared between users and between +bundles. + +These locations are unaffected by upgrade or rollback, but will be +cleared by a data reset. + +#### Other well-known directories + +**Unresolved:** [][Is `PICTURES` per-user?] + +**Unresolved:** [][What is the scope of `DESKTOP`, `DOCUMENTS`, `TEMPLATES`?] + +### Implementation + +Application bundles should be installed according to the +[Apertis Application Bundle Specification]. This document does not duplicate +the information provided in that specification, but only gives rationale. + +The split between `/Applications` or `/usr/Applications` for static data, +and `/var/Applications` for variable data, makes it easy for developers +and AppArmor profiles to distinguish between static and variable data. It +also results in the two different algorithms used during upgrade for +store apps being applied to different directories. + +The additional split between `/Applications` for store application +bundles, and `/usr/Applications` for built-in application bundles, +serves two purposes: + +* `/usr` is part of the *system partition*, which is read-only at runtime +(for robustness), contains the platform and built-in application bundles, +and has a limited storage quota because the safe upgrade/rollback +mechanism means it appears on-disk twice. `/Applications` is part of +the *general storage partition*, which has a more generous storage quota +and is read/write at runtime. + +* Using a distinctive prefix for built-in +application bundles makes it trivial to identify built-in applications +from their AppArmor profile names, which are conventionally linked to +the programs' filenames. + +The specified layout was chosen so that the +static files in `share/` and `lib/` could be +organised in the way that would be conventional for Automake +installation with a `--prefix=/Applications/$bundle_id` +or `--prefix=/usr/Applications/$bundle_id` option. For +example, because the app icon in a store app bundle is named something like +`/Applications/$bundle_id/share/icons/hicolor/$size/apps/$entry_point_id.png`, +it can be installed to +`${datadir}/icons/hicolor/$size/apps/$entry_point_id.png` in the usual way. + +If there are any non-Automake-based application bundles, they should be +configured to install in the same GNU-style directory hierarchy that we +would use with Automake, with the analogous parameter corresponding +to ${prefix}. We do not recommend +distributing non-Automake-based application bundles. + +The top-level `config`, `cache`, `data` directories within +the bundle's variable data should only be created if the application +bundle has special permissions flags. `config`, `cache`, `data` should +be considered to be a minor "red flag" by [app-store curators][App Store +Approval]: because they share data across user boundaries, they come with +some risk. + +#### System integration links for built-in applications + +The `.deb` package for built-in applications should also include symbolic +links for the following system integration files: + +* Entry points: link `/usr/share/applications/*.service` points to + `/usr/Applications/$bundle_id/share/applications/*.service` +* Icons: `/usr/share/icons/*` → `/usr/Applications/$bundle_id/share/icons/*` +* Other theme files: `/usr/share/themes/*` → + `/usr/Applications/$bundle_id/share/themes/*` + +Store applications must not contain these links: +similar links are created at install-time instead. See +[][Store application system integration links] for details. + +#### Special directory configuration + +Programs in store application bundles should be run with these environment +variables, so that they automatically use appropriate directories: + +* `XDG_DATA_HOME=/var/Applications/$bundle_id/users/$uid/data` (used by + `g_get_user_data_dir`) +* `XDG_DATA_DIRS=/Applications/$bundle_id/share:/var/lib/apertis_extensions/public:/usr/share` + (used by `g_get_system_data_dirs`) + * See [][Store application system integration links] for the + rationale for `/var/lib/apertis_extensions/public` +* `XDG_CONFIG_HOME=/var/Applications/$bundle_id/users/$uid/config` + (used by `g_get_user_config_dir`) +* `XDG_CONFIG_DIRS=/var/Applications/$bundle_id/etc/xdg:/Applications/$bundle_id/etc/xdg:/etc/xdg` + (used by `g_get_system_config_dirs`) +* `XDG_CACHE_HOME=/var/Applications/$bundle_id/users/$uid/cache` (used by + `g_get_user_cache_dir`) +* `PATH=/Applications/$bundle_id/bin:/usr/bin:/bin` (used when executing + programs) +* `XDG_RUNTIME_DIR=/run/user/$uid` (used by `g_get_user_runtime_dir` + and provided automatically by systemd; access is subject to a "whitelist") + +**Unresolved:** [][Should `LD_LIBRARY_PATH` be set?] + +This is automatically done by `canterbury-exec` in Apertis 16.06 or later, +unless the entry point's bundle ID cannot be determined from its `.desktop` +file. For backwards compatibility, Canterbury in Apertis 16.09 still attempts +to run entry points whose bundle ID cannot be determined, but this should +be prevented in future. + +Built-in application bundles should be given the same environment +variables, but with `/usr/Applications` replacing `/Applications`. + +**Unresolved:** [][Is `g_get_home_dir()` bundle-independent?] + +**Unresolved:** [][Is `g_get_temp_dir()` bundle-independent?] + +In addition, the XDG special directories should be configured as follows +for both built-in and store application bundles: + +* `g_get_user_special_dir (G_USER_DIRECTORY_DESKTOP)`: **Unresolved**: + [][What is the scope of `DESKTOP`, `DOCUMENTS`, `TEMPLATES`?] +* `g_get_user_special_dir (G_USER_DIRECTORY_DOCUMENTS)`: **Unresolved**: + [][What is the scope of `DESKTOP`, `DOCUMENTS`, `TEMPLATES`?] +* `g_get_user_special_dir (G_USER_DIRECTORY_DOWNLOAD)`: + `/var/Applications/$bundle_id/users/$uid/downloads` +* `g_get_user_special_dir (G_USER_DIRECTORY_MUSIC)`: `/home/shared/Music` +* `g_get_user_special_dir (G_USER_DIRECTORY_PICTURES)`: **Unresolved**: + [][Is `PICTURES` per-user?] +* `g_get_user_special_dir (G_USER_DIRECTORY_PUBLIC_SHARE)`: `/home/shared` +* `g_get_user_special_dir (G_USER_DIRECTORY_TEMPLATES)`: **Unresolved**: + [][What is the scope of `DESKTOP`, `DOCUMENTS`, `TEMPLATES`?] +* `g_get_user_special_dir (G_USER_DIRECTORY_VIDEOS)`: + `/home/shared/Videos` + +Again, this is automatically done by `canterbury-exec` in Apertis 16.06 or +later. + +### Permissions and ownership + +All files under `/usr/Applications` and `/Applications` should be owned +by root, with the standard system permissions (`u=rwX,og=rX` — that is, +root may write, and all users may read all files, execute programs that +are marked executable and enter directories). + +`/var/Applications`, `/var/Applications/$bundle_id` and +`/var/Applications/$bundle_id/users/` are also owned by root, with the +standard system permissions. + +If they exist, `/var/Applications/$bundle_id/{config,data,cache}/` are +owned by root, with permissions `a=rwx`. If they are not required and +allowed by a permissions flag, they must not exist. + +**Unresolved**: [][Can we prevent symlink attacks in shared directories?] + +`/var/Applications/$bundle_id/users/$uid/` and all of its subdirectories +are owned by `$uid`, with permissions `u=rwx,og-rwx` for privacy (in +other words, only accessible by the owner or by root). + +### Physical layout + +The application-visible directories in `/var/Applications` and `/Applications` +are only mount points. Applications' real storage is situated on the general +storage volume, in the following layout: + +--- +<general storage volume> +├─app-bundles/ +│ ├─com.example.MyApp/ (store app-bundle) +│ │ ├─current → version-1.2.2-1 (symbolic link) +│ │ ├─rollback → version-1.0.8-2 (symbolic link) +│ │ ├─version-1.0.8-2/ +│ │ │ ├─static/ (subvolume) +│ │ │ │ ├─bin/ +│ │ │ │ └─share/ (etc.) +│ │ │ └─variable/ (subvolume) +│ │ │ └─users/ +│ │ │ └─1001/ +│ │ │ ├─cache/ +│ │ │ ├─config/ +│ │ │ └─data/ (etc.) +│ │ └─version-1.2.2-1/ +│ │ ├─static/ (subvolume) +│ │ └─variable/ (subvolume) +│ └─org.apertis.Frampton/ (store app-bundle) +│ ├─current → version-2.5.1-1 (symbolic link) +│ └─version-2.5.1-1/ +│ └─variable/ (subvolume) +… <other directories subvolumes unrelated to application bundles> +--- + +The `static` and `variable` directories are `btrfs` subvolumes so that they +can be copied using snapshots, while the other directories shown may be +either subvolumes or ordinary directories. The `current` and `rollback` +symbolic links indicate the currently active version, and the version to +which a rollback would move, respectively. + +Built-in application bundles do not have a `static` subvolume, because their +static files are part of `/usr` on the read-only operating system volume. + +All other filenames in this hierarchy are reserved for the application manager, +which may create temporary directories and symbolic links during its operation. +It must create these in such a way that it can recover from abrupt power loss +at any point, for example by making careful use of POSIX atomic filesystem +operations to implement "transactions". + +During normal operation, the subvolumes would be mounted as follows: + +--- +com.example.MyApp/current/static → /Applications/com.example.MyApp +com.example.MyApp/current/variable → /var/Applications/com.example.MyApp +org.apertis.Frampton/current/variable → /var/Applications/org.apertis.Frampton +--- + +so that the expected paths such as +`/var/Applications/com.example.MyApp/users/1001/config/` are made available. + +Only one subvolume per application is mounted – under normal +circumstances, this will be the one with the highest version. After a +system rollback it might be an older version if the most recent is +unlaunchable. + +### Installation and upgrading + +Suppose we are installing `com.example.MyApp` version 2, or upgrading +it from version 1 to version 2. An optimal implementation would look +something like this: + +* If it was already installed: + * Instruct any running processes belonging to that bundle to exit + * Wait for the processes to save their state and exit; if a timeout + is reached, kill the processes + * Unmount the `com.example.MyApp/version-1/static` subvolume from + `/Applications/com.example.MyApp` + * Unmount the `com.example.MyApp/version-1/variable` subvolume from + `/var/Applications/com.example.MyApp` + * Create a snapshot of `com.example.MyApp/version-1/static` named + `com.example.MyApp/version-2/static` + * Create a new snapshot of `com.example.MyApp/version-1/variable`, + named `com.example.MyApp/version-2/variable` + * Recursively delete the `cache` and `users/*/cache` directories from + `com.example.MyApp/version-1/variable` +* If it was not already installed, instead: + * Create a new, empty subvolume `com.example.MyApp/version-2/variable` + to be mounted at `/var/Applications/com.example.MyApp` + * Create a new, empty subvolume `com.example.MyApp/version-2/static` + to be mounted at `/Applications/com.example.MyApp` +* For each existing static file in `com.example.MyApp/version-2/static` that + was carried over from `com.example.MyApp/version-1/static`: + * If there is no corresponding file in version 2, delete it + * If its contents do not match the corresponding file in version 2, + delete it + * If its metadata do not match the one in version 2, update the + metadata +* For each static file in version 2: + * If there is no corresponding file in `com.example.MyApp/version-2/static`, + the file is either new or changed. Unpack the new version. +* *(Optional, if support for this feature is required)* Copy any files + required from `share/factory/{etc,var}` to `{etc,var}`, overwriting files + retained from previous versions if and only if the retained version + matches what is in version 1's `share/factory/{etc,var}` but does not + match version 2's `share/factory/{etc,var}` + +A simpler procedure would be to create the `com.example.MyApp/version-2/static` +subvolume as empty, and then unpack all of the static files from the +new version. However, that procedure would not provide de-duplication +between consecutive versions if a file has not changed. +As of Apertis 16.09, only this simpler procedure has been implemented. + +Ribchester (and perhaps Canterbury) must be modified to create the +per-user directories `/var/Applications/$bundle_id/users/$uid`. This +was implemented in Apertis 16.06. + +#### Store application system integration links + +Application installation for store applications may +set up symbolic links in `/var/lib/apertis_extensions` +for the categories of system integration files described in +[][System integration links for built-in applications], but the files and +their contents must be [restricted][App Store Approval] unless the bundle +has special permissions flags. In particular, all entry points (agents +and applications) in a bundle must be in the relevant [ISV]'s namespace. + +For example, an application bundle containing a user interface and an +agent could be linked like this: + +* `/var/lib/apertis_extensions/applications/com.example.MyApp.UI.desktop` + → `/Applications/com.example.MyApp/share/applications/com.example.MyApp.UI.desktop` +* `/var/lib/apertis_extensions/applications/com.example.MyApp.Agent.desktop` + → `/Applications/com.example.MyApp/share/applications/com.example.MyApp.Agent.desktop` + +The designers of Apertis can introduce new system integration points in +future versions if required. + +The platform components that need to support loading "extension" +components from store application bundles will be modified +or configured to look in `/var/lib/apertis_extensions`. For +example, Canterbury could be run with +`XDG_DATA_DIRS=/var/lib/apertis_extensions:/usr/share` +so that it will pick up activatable services from +`/var/lib/apertis_extensions/dbus-1/services`. + +#### System integration links for public extensions + +`/var/lib/apertis_extensions` should *not* be included in the +`XDG_DATA_DIRS` for store applications, so that store applications do not +automatically attempt to read these restricted directories and receive +AppArmor denials. However, a few types of system extension should be +loaded by all programs, not just privileged platform components. For +example, GUI themes would typically provide icons in `$datadir/icons` +and other related files in `$datadir/themes`, which are intended to be +loaded by arbitrary applications (so that those applications coordinate +with the theme). + +We recommend that the system bind-mounts or copies these files into the +corresponding subdirectory of `/var/lib/apertis_extensions/public`. In +conjunction with the environment variables described above, this means +that libraries and applications that follow the [XDG Base Directory +specification], for example Gtk's theme support, will load them +automatically. + +Please note that symbolic links are *not* suitable for public extensions, +because AppArmor access-control is based on the result of dereferencing +the symbolic link: if a store application `com.example.ShoppingList` +renders widgets using the `org.example.metallic` theme, it would +not be allowed to read through a symbolic link that points into +`/Applications/org.example.metallic/share/themes/org.example.metallic/`, +but it can be allowed to read the same directory +indirectly by bind-mounting that directory onto +`/var/lib/apertis_extensions/public/themes/org.example.metallic/`. + +### Uninstallation + +* Uninstalling a store application bundle consists of removing + `/Applications/$bundle_id`, `/var/Applications/$bundle_id` and the + corresponding subvolumes. +* Uninstalling a built-in application bundle is not possible, but it can + be reset (equivalent to uninstallation and reinstallation) by deleting + and re-creating `/var/Applications/$bundle_id` and its corresponding + subvolumes. +* Deleting a user should delete every directory matching + `/var/Applications/*/users/$uid`, in addition to the user's home + directory. +* A "data reset" consists of: + * deleting and re-creating `/var/Applications/$bundle_id` for every + application bundle + * *(optional, if a data reset is intended to uninstall store app + bundles)* clearing `/Applications` + * *(optional, if this feature is required)* populating `{etc,var}` + from `share/factory/{etc,var}` as if for initial installation + +### AppArmor profiles + +Every application bundle should have rules similar to these in its +AppArmor profile: + +* `#include <abstractions/chaiwala-base>` (normal "safe" functionality) +* `/{usr/,}Applications/$bundle_id/{bin,lib,libexec}/** mr` (map libraries + and the executable described by the profile; read arch-dependent static + files) +* `/{usr/,}Applications/$bundle_id/{bin,libexec}/** pix` (run other + executables from the same bundle under their own profile, or inherit + current profile if they do not have their own) +* `/{usr/,}Applications/$bundle_id/share/** r` (read arch-independent + static files) +* `owner /var/Applications/$bundle_id/users/** rwk` (read, write and + lock per-app, per-user files for the user running the app) + +Note that a write is only allowed if it is allowed by both AppArmor +and file permissions, so user A is normally prevented from accessing +user B's files by file permissions. The last rule is given the `owner` +keyword only for completeness. + +Application bundles that require them may additionally have rules similar +to these: + +* `/var/Applications/$bundle_id/{config,data,cache}/** rwk` (read, write, + lock per-bundle, cross-user variable files) +* `/home/shared/{Music,Videos} rwk` (read, write, lock cross-bundle, + cross-user media files) +* `/home/shared/{,**} rwk` (read, write, lock all cross-bundle, + cross-user files) +* `owner /home/*/$something rwk` (read, write, lock selected cross-bundle, + per-user files for the user running the app) + +`<abstractions/chaiwala-base>` should be modified to include + +* `/var/lib/apertis_extensions/public/** r` + +to support public extensions. + +## Unresolved design questions + +### Are downloads rolled back? + +Newport stores downloaded files in a directory per (bundle ID, user) +pair. When an app is rolled back, are those files treated like a cache +(deleted), or treated like user data (also rolled back), or left as +they are? + +### Does data reset uninstall apps? + +Does a data reset leave the installed store apps installed, or does it +uninstall them all? (In other words, does it leave store apps' static +files intact, or does it delete them?) + +### Are inactive themes visible to all? + +Suppose the system-wide theme is "blue", and the user has installed but +not activated "red" and "green" themes from the app store. Is it OK for +an unprivileged app-bundle to be able to see that the "red" and "green" +themes exist? +* The same applies to any other [][Public system extensions]. +* For simplicity, we recommend the answer "yes, this is acceptable" + unless there is a reason to do otherwise. + +### Are built-in bundles visible to all? + +We know that unprivileged app-bundles are not allowed to enumerate the +store application bundles that are installed. Is it OK for an unprivileged +app-bundle to be allowed to enumerate the built-in application bundles? +* For simplicity, we recommend the answer "yes, this is acceptable" + unless there is a reason to do otherwise. + +### Standard icon sizes? + +Are there specific icon sizes that we want to require every app to +supply? As of November 2015, the "Mildenhall" reference HMI uses 36x36 +icons. Launchers should be prepared to scale icons as a fallback, but +scaled icons at small pixel sizes tend to look blurry and low-quality, +so icons of exactly the size required for the HMI should be preferred. + +### How do bundles discover the per-user, bundle-independent location? + +The precise location to be used for per-user, bundle-independent data, +and the API to get it, has not been decided. + +### Is `g_get_home_dir()` bundle-independent? + +It is undecided whether the `HOME` environment variable and +`g_get_home_dir()` should point to `/home/$user`, or to a per-user, +per-bundle location. If those point to a per-user, per-bundle location, +then a separate API will need to be provided by libcanterbury with which +a program can access per-user, bundle-independent data. + +### Is `g_get_temp_dir()` bundle-independent? + +It is undecided whether the `TMPDIR` environment variable and +`g_get_temp_dir()` should point to `/tmp` as they normally do, or to a +per-user, per-bundle location. + +### Is `PICTURES` per-user? + +Should `G_USER_DIRECTORY_PICTURES` be shared between users and between +bundles like `G_USER_DIRECTORY_MUSIC` and `G_USER_DIRECTORY_VIDEOS`, or +should it be per-user like `$HOME`, or should it be per-user per-bundle +like `g_get_user_cache_dir()`? + +As of Apertis 16.06, it has been implemented as shared, like +`G_USER_DIRECTORY_MUSIC`. + +### What is the scope of `DESKTOP`, `DOCUMENTS`, `TEMPLATES`? + +What should the scope of `G_USER_DIRECTORY_DESKTOP`, +`G_USER_DIRECTORY_DOCUMENTS`, `G_USER_DIRECTORY_TEMPLATES` be? Or should +we declare these to be unsupported on Apertis, and set them to the same +place as `$HOME` as documented by their specification? + +As of Apertis 16.06, these were marked as unsupported and set to be the +same as `$HOME`. + +## Unresolved implementation questions + +### Can we prevent symlink attacks in shared directories? + +Can we use AppArmor to prevent the creation of symbolic links in +directories that are shared between users or between bundles, so that +applications do not need to take precautions to avoid writing through +a symbolic link, which could allow one trust domain to make another +trust domain overwrite a chosen file if the writing application is +insufficiently careful? We probably cannot use `+t` permissions (the +"sticky bit", which activates restricted deletion and symlink protection), +because that would prevent one user from deleting a file created by +another user, which is undesired here. + +### Should `LD_LIBRARY_PATH` be set? + +The Autotools build system (`autoconf`, `automake` and `libtool`) will +automatically configure executables to load libraries built from the same +source tree in their installed locations, using the `DT_RPATH` ELF header, +so it is unnecessary to set `LD_LIBRARY_PATH`. + +However, we might wish to set `LD_LIBRARY_PATH=/Applications/${bundle_id}/lib` +(or the obvious `/usr/Applications` equivalent) so that app-bundles built with +a non-Automake build system will "just work". + +Similarly, we might wish to set +`GI_TYPELIB_PATH=/Applications/${bundle_id}/lib/girepository-1.0` for +app-bundles that use GObject-Introspection. + +## Alternative designs + +### Merge static and variable files for store applications + +One option that was considered was to separate the read-only parts of +built-in application bundles (in `/usr/Applications`) from the read/write +parts (in `/Applications`), but not separate the read-only parts of +store application bundles (in `/Applications`) from the read/write parts +(also in `/Applications`). + +This reduces the number of subvolumes (one subvolume per store bundle +instead of two), but requires additional complexity in the store +bundle installer: it would have to distinguish between the static data +directories (`bin`, `share`, etc.) and the variable data directories +(`cache`, `users`, etc.) by name. + +### Add a third subvolume per app-bundle for cache + +Conversely, because cache files are not rolled back, we could consider +separating disposable cache files from the other read/write parts; they +would not be subject to snapshots, and during a rollback, the cache +subvolume would simply be deleted and re-created. + +### Each user's files under their `$HOME` + +This strategy is not recommended, and is only mentioned here to document +why we have not taken it. + +The recommendations above keep all users' variable files for a given +application bundle, and any variable files for that bundle that are +shared among all users, together. An alternative design that we could +have used would be to keep all of a user's variable files, across all +bundles, in one place (for example their home directory, `$HOME`). + +Because store application bundles can be rolled back independently, each +user would need at least one subvolume per store application bundle plus +one subvolume for built-in application bundles, so that the chosen store +application bundle's data area could be rolled back without affecting +other bundles. + +The reason that this design was rejected is that it scales poorly in +some cases, including the one that we expect to be most frequent (store +app-bundle installation and uninstallation). While it does require fewer +subvolume manipulations than the recommended design for some operations, +those operations are expected to be rare. To illustrate this, suppose +we have 10 built-in bundles, 20 store bundles and 5 users. + +If we install, upgrade or remove the store bundle `com.example.MyApp`, +which additionally has some variable files that are shared between +users. With the recommended design, we only have to perform O(1) +subvolume operations (two with the recommended design, one if we +[][Merge static and variable files for store applications], or three +if we [][Add a third subvolume per app-bundle for cache]). In this +alternative design, we would have to perform O(number of users) subvolume +operations, in this case 7: one for the bundle's static files, one for +its variable files shared between users, and one per user. + +Similarly, when we upgrade the platform and we wish to take a snapshot +of each built-in application's data, the recommended design requires +us to take 10 snapshots (more generally O(1), one per built-in bundle), +whereas this alternative requires 50-60 snapshots (more generally O(number +of users), one per built-in bundle per user, and zero or one per built-in +bundle for non-user-specific data). + +If we add or delete a user, in the recommended design we would have +to perform 31 subvolume operations, or more generally O(number of +bundles): one per store or built-in bundle, plus one extra operation for +non-bundle-specific data. In this alternative we would need a minimum of +22 subvolume operations, or more generally O(number of store bundles): +one per store bundle, one for all built-in bundles together, and one +for non-bundle-specific data. + +If we perform a data reset without uninstalling store app bundles, +the recommended design would require at least 30 subvolume deletions +(one per application bundle), whereas this design would require at least +150 subvolume deletions (one per bundle per user). + +### System integration links for services + +It would be technically possible to install user-services (services that +run as a particular user, similar to Tracker) in an application bundle, +and register them with the wider system via system integration links +([][System integration links for built-in applications], +[][Store application system integration links]) +pointing to their systemd user services and D-Bus session services. + +We recommend that this is not done, because general systemd user +services are powerful and have a global effect. Instead, we recommend +that per-app-bundle user-services (agents) are implemented by having the +application manager (Canterbury) generate a carefully constrained +subset of service file syntax from the entry point metadata. + +### System services in app-bundles + +It would be technically possible to install system services (services that +do not run as a specific user) in an application bundle, registering them +via system integration links as above. + +We recommend that this is not done, because system services are extremely +powerful and can have extensive privileges. Instead, system services should +be part of the [platform] layer. + +## Appendix: application layout in Apertis 15.09 + +Sudoku is one example of a store application bundle. Its source code is +not currently public. `xyz` is used here to represent the common prefix +for an Apertis variant. The layout of the store application bundle looks +like this: + +--- +/appstore/ + store.json + store.sig + xyz-sudoku_config.tar + xyz-sudoku_config/ + xyz-sudoku.png + xyz-sudoku_manifest.json +/xyz-sudoku.tar + xyz-sudoku/ + bin/ + xyz-sudoku + share + glib-2.0 + schemas + com.app.xyz-sudoku.gschema.xml + com.app.xyz-sudoku.enums.xml + gschemas.compiled + background.png + icon_sudoku.png + (more graphics) +--- + +The manifest indicates that `/xyz-sudoku.tar` is expected to +be unpacked into `/Applications`, leading to filenames like +`/Applications/xyz-sudoku/bin/xyz-sudoku`. + +[Frampton] is an example of a built-in application bundle shipped in +15.09. Its layout is as follows: + +--- +/usr/ + Applications/ + frampton/ + bin/ + frampton + frampton-agent + test-frampton-agent + lib/ + libframptonagentiface.so{,.0,.0.0.0} + share/ + IconBig_Music.png + icon_albums_inactive.png + ... + artist-album-views/ + DetailView.json + ... + glib-2.0/ + schemas/ + com.app.frampton-agent.gschema.xml + ... + locale/ + de/ + ... +/Applications/ + Frampton/ + app-data/ + Internal/ + FramptonAgent.db + frampton/ + app-data/ + (empty) +--- + +Issues with the application filesystem layout in these examples: + +* There is no "manifest" file with metadata for the built-in application + bundle as a whole. +* The "manifest" files for entry points in both store and built-in + applications are GSettings schema XML, which is not how GSettings + is designed to be used. They are also incorrectly namespaced: the app + developer presumably does not own `app.com`. We should use `org.apertis.*` + for Apertis components, `{com,net,org}.example.*` for developer examples, + and a vendor's name elsewhere. +* There is no separation between users. "user" owns all of /Applications. +* Frampton's app bundle ID is ambiguous: is it Frampton or frampton? We + should choose exactly one ID, and make the AppArmor profile forbid using + the other. +* Frampton's app bundle ID is not namespaced. The [Applications + design document] specifies use of a [reversed domain name] such as + org.apertis.Frampton. +* Similarly, Sudoku's app bundle ID is not namespaced. +* There is no well-known location for apps' icons: Frampton places + its icons in /usr/Applications/frampton/share/, but other apps use + /usr/Applications/$bundle_id/share/images, requiring mildenhall-launcher + to be allowed to read both locations. +* There is no well-known location into which Newport may download files. + +## Appendix: comparison with other systems + +### Desktop Linux (packaged apps) + +There are many possibilities, but a common coding standard looks like this: + +* Main programs are installed in `$bindir` (which is set to `/usr/bin`) +* Supporting programs are installed in `$libexecdir` (which is set + to either `/usr/libexec` or `/usr/lib`), often in a subdirectory per + application package +* Public shared libraries are installed in `$libdir` (which is set to + either `/usr/lib` or `/usr/lib64` or `/usr/lib/$architecture`) + * Plugins are installed in a subdirectory of `$libdir` + * Private shared libraries are installed in a subdirectory of + `$libdir` +* `.gresource` resource bundles (and any resource files that cannot use + GResource) are installed in `$datadir`, which is set to `/usr/share` +* System-level configuration is installed in a subdirectory of + `$sysconfdir`, which is set to `/etc` +* System-level variable data is installed in `$localstatedir/lib/$package` + and `$localstatedir/cache/$package`, with `$localstatedir` set to `/var` +* There is normally no technical protection between apps, but + per-user variable data is stored according to the [XDG Base Directory + specification] in: + * `$XDG_CONFIG_HOME/$package`, defaulting to + `/home/$username/.config/$package`, where $username is the user's + login name and $package is the short name of the application or + package + * `$XDG_DATA_HOME/$package`, defaulting to + `/home/$username/.local/share/$package` + * `$XDG_CACHE_HOME/$package`, defaulting to + `/home/$username/.cache/$package` +* The user's home directory, normally `/home/$username`, is shared + between apps but private to the user + * It is usually technically possible for one app to alter another + app's subdirectories of `$XDG_CONFIG_HOME` etc. +* There is no standard location that can be read and written by all users, + other than temporary directories which are not intended to be shared + +[Debian Policy §9.1 "File system hierarchy"] describes the policy +followed on Debian and Ubuntu systems for non-user-specific data. It +references the [Filesystem Hierarchy Standard, version 2.3]. + +Similar documents: + +* The [Filesystem Hierarchy Standard, version 3.0] has not yet been + adopted by Debian Policy. +* The [GNU Coding Standards][GNU Coding Standards Directory Variables] + use a similar layout by default. +* [systemd's proposals for file hierarchy] have been partially adopted + by Linux distributions. + +### Flatpak + +Autoconf/Automake software in a [Flatpak](http://flatpak.org/) package is built with +`--prefix=/app`, and the static files of the app are mounted at `/app` +inside the sandbox. Each Flatpak has its own private view of the +filesystem inside its sandbox, so this does not lead to conflict over +ownership of `/app` as might be expected. + +* Main programs are installed in `$bindir`, which is `/app/bin` +* Supporting programs are installed in `$libexecdir`, which is + `/app/libexec` +* Private shared libraries are installed in `$libdir`, which is + `/app/lib`, or in a subdirectory + * Plugins are installed in a subdirectory of `$libdir` +* Static resources are embedded using GResource, installed in `/app/share` + as a `.gresource` resource bundle, or installed in `/app/share` as +plain files +* System-level configuration is installed in `/app/etc` +* Per-user variable data is stored in + `/home/$username/.var/app/$app_id/{data,config,cache}`, which + are bind-mounted into the app's filesystem namespace, with the + `XDG_{DATA,CONFIG,CACHE}_HOME` environment variables set to point at + those locations +* Shared variable data is stored in `/var/lib/$app_id`, + `/var/cache/$app_id`. *(How widely shared is this really?)* + +Integration files (systemd units, D-Bus services, etc.) are +said to be *exported* by the Flatpak, and they are linked into +`$XDG_DATA_HOME/flatpak/exports` or `/var/lib/flatpak/exports` outside +the sandbox. + +*Runtimes* (sets of libraries) are mounted at `/usr` inside the sandbox. + +### Android + +* System app packages (the equivalent of our [built-in application + bundles]) are stored in `/system/app/$package.apk` +* Normal app packages (the equivalent of our [store application bundles]) + are stored in `/data/app/$package.apk` +* Private shared libraries and plugins (and, technically, + any other supporting files) are automatically unpacked into + `/data/data/$package/lib/` by the OS +* Resource files are loaded from inside the `.apk` file (analogous to + GResource) instead of existing as files in the filesystem +* Per-user variable data is stored in `/data/data/$package/` on + single-user devices +* Per-user variable data is stored in `/data/user/$user/$package/` + on multi-user devices +* There is no location that is private to an app but shared between + users. The closest equivalent is `/sdcard/$package`, which is + conventionally only used by the app `$package`, but is technically + accessible to all apps. +* There is no location that is shared between apps but private to a user. +* `/sdcard` is shared between apps and between users. Large data files + such as music and videos are normally stored here. + +### systemd "revisiting Linux systems" proposal + +The [authors of systemd propose a structure like this][systemd "revisiting Linux systems" proposal]. At the time of +writing, no implementations of this idea are known. + +* The static files of application bundles are installed in a subvolume + named `app:$bundle_id:$runtime:$architecture:$version`, where: + * `$bundle_id` is a reversed domain name identifying the app bundle itself + * `$runtime` identifies the set of runtime libraries needed by the + application bundle (in our case it might be `org.apertis.r15_09`) + * `$architecture` represents the CPU architecture + * `$version` represents the version number +* That subvolume is mounted at `/opt/$bundle_id` in the app sandbox. The + corresponding runtime is mounted at `/usr`. +* User-specific variable files are in a subvolume named, for example, + `home:user:1000:1000` which is mounted at `/home/user`. +* System-level variable files go in `/etc` and `/var` as usual. +* There is currently no concrete proposal for a trust boundary between + apps: all apps are assumed to have full access to `/home`. +* There is no location that is private to an app but shared between users. +* There is no location that is shared between apps and between users, + other than removable media. + +## References + +* [Applications design document] (v0.5.4 used) +* [Multimedia design document] (v0.5.4 used) +* [Security design document] (v1.1.3 used) +* [System Update and Rollback design document] (v1.6.2 used) + +<!-- Other documents --> + +[Apertis Application Bundle Specification]: https://appdev.apertis.org/documentation/bundle-spec.html +[Applications design]: applications.md +[Applications design document]: applications.md +[Application Entry Points]: https://wiki.apertis.org/Application_Entry_Points +[App store approval]: https://wiki.apertis.org/App_Store_Approval +[Interface discovery]: https://wiki.apertis.org/Interface_discovery +[Multimedia design]: multimedia.md +[Multimedia design document]: multimedia.md +[Multi-user design]: multiuser.md +[Multi-user design document]: multiuser.md +[Preferences and Persistence design document]: preferences-and-persistence.md +[System Update and Rollback design]: system-updates-and-rollback.md +[System Update and Rollback design document]: system-updates-and-rollback.md +[Security design]: security.md +[Security design document]: security.md + +<!-- Glossary --> + +[Built-in application bundle]: https://wiki.apertis.org/Glossary#built-in-application-bundle +[Built-in application bundles]: https://wiki.apertis.org/Glossary#built-in-application-bundle +[ISV]: https://wiki.apertis.org/Glossary#isv +[Platform]: https://wiki.apertis.org/Glossary#platform +[Program]: https://wiki.apertis.org/Glossary#program +[Programs]: https://wiki.apertis.org/Glossary#program +[Reversed domain name]: https://wiki.apertis.org/Glossary#reversed-domain-name +[Reversed domain names]: https://wiki.apertis.org/Glossary#reversed-domain-name +[Store application bundle]: https://wiki.apertis.org/Glossary#store-application-bundle +[Store application bundles]: https://wiki.apertis.org/Glossary#store-application-bundle +[User]: https://wiki.apertis.org/Glossary#user + +<!-- Other links --> + +[Bug 283]: https://bugs.apertis.org/show_bug.cgi?id=283 +[Debian Policy §9.1 "File system hierarchy"]: https://www.debian.org/doc/debian-policy/ch-opersys.html#s9.1 +[Filesystem Hierarchy Standard, version 2.3]: http://www.pathname.com/fhs/pub/fhs-2.3.html +[Filesystem Hierarchy Standard, version 3.0]: http://refspecs.linuxfoundation.org/FHS_3.0/fhs/index.html +[Frampton]: https://gitlab.apertis.org/appfw/frampton +[freedesktop.org Icon Theme Specification]: http://standards.freedesktop.org/icon-theme-spec/icon-theme-spec-latest.html +[GNU Coding Standards Directory Variables]: https://www.gnu.org/prep/standards/html_node/Directory-Variables.html#Directory-Variables +[GResource]: https://developer.gnome.org/gio/stable/GResource.html +[Lennart Poettering's proposal for stateless systems]: http://0pointer.net/blog/projects/stateless.html +[libframptonagentiface]: https://gitlab.apertis.org/appfw/frampton/tree/master/src/interface +[systemd's proposals for file hierarchy]: http://www.freedesktop.org/software/systemd/man/file-hierarchy.html +[systemd "revisiting Linux systems" proposal]: http://0pointer.net/blog/revisiting-how-we-put-together-linux-systems.html +[XDG Base Directory Specification]: http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html + +<!-- vim:set linebreak: --> diff --git a/content/designs/applications.md b/content/designs/applications.md new file mode 100644 index 0000000000000000000000000000000000000000..114db3b5946cf67216a6a0de286e4086d960925c --- /dev/null +++ b/content/designs/applications.md @@ -0,0 +1,1379 @@ +--- +title: Applications +short-description: Overview of application handling by Apertis + (partially-implemented, appstore not available) +authors: + - name: Derek Foreman + - name: Simon McVittie +--- + +# Applications + +This document is intended to give a high-level overview of application +handling by Apertis. Topics handled include the storage of applications +and related data on the device, the format of the distributable +application bundle files, how they're integrated into the system, and +how the system manages them at run-time. Topics related to the +development of applications are covered by several other designs. + +Unfortunately, the term “application†has seen a lot of misuse in recent +times. While many mobile devices have an “application store†that +distributes “application packagesâ€, what is actually in one of those +packages may not fit any sensible definition of an application – as an +example, on the Nokia N9 one can download a package from the application +store that adds MSN Messenger capabilities to the existing chat +application. + +To avoid ambiguity, this document will avoid using “application†as a +jargon term. Instead, we use two distinct terms for separate concepts +that could informally be referred to as applications: *graphical +programs*, and *application bundles*. See [][Terminology]. + +Apertis is a multiuser system; see the [](multiuser.md) design document for +more on the specifics of the multiuser experience and the division of +responsibilities between middleware and HMI elements. + +Apertis has first shipped with a custom application framework to address the +needs described in this document, see [](canterbury-legacy-application-framework.md). +The custom legacy framework is in the process of being replaced with an +evolution based upstream components, see [](application-framework.md). + +## Traditional package managers are unfit for applications + +Apertis relies heaviliy on a traditional packaging system to compose +the base OS. However, it does not rely on it to distribute the composed +system as it is not a good fit for the use-cases Apertis addresses, +see [](system-updates-and-rollback.md) for more details. +Similarly, a traditional packaging system is not a good fit for applications in +Apertis since: + + - Apertis relies on a immutable base OS to implement a robust update + mechanism, see [](system-updates-and-rollback.md) for more details. + This means that a traditional package manager is not used to distribute + updates on the field and that the writable application storage should + be kept separate from the read-only base OS. + + - Application bundles don't depend on each other – this makes creating + a new special purpose package management solution much easier, and + removes the main reason for customizing an existing solution to fit + Apertis-specific needs. + + - Much of the complexity in application bundle handling (DRM, + rollbacks, communicating security “permissions†to the user) is not + part of the existing package management tools, and is not + interesting to the upstream tool maintainers. + + - Applications can have conflicting dependencies which can't be shipped + as part of the base OS and should be somehow bundled with the + application itself. + +## Terminology + +### Graphical program + +A ***graphical program*** is a program with its own UI drawing surface, +managed by the system's window manager. This matches the sense with +which “application†is traditionally used on desktop/laptop operating +systems, for instance referring to Notepad or to Microsoft Word. + +### Bundle + +A ***bundle*** or ***application bundle*** is a group of functionally +related components (be they services, data, or programs), installed as a +unit. This matches the sense with which “app†is typically used on +mobile platforms such as Android and iOS; for example, we would say that +an Android .apk file contains a bundle. Some systems refer to this +concept as a ***package***, but that term is strongly associated with +dpkg/apt (.deb) packages in Debian-derived systems, so we have avoided +that term in this document. + +### Store account + +The [][Digital rights management] section discusses ***store accounts***, +anticipated to have a role analogous to Google Play accounts on Android +or Apple Store accounts on iOS. If these accounts exist, we recommend +against using the term “user†for them, since that would be easily +confused with the users found in the Multiuser design document; it is +not necessarily true that every user has access to a store account, or +that every store account corresponds to only one user. + +## Software Categories + +The software in a Apertis device can be divided into three categories: +*platform*, *built-in application bundles* and *store application +bundles*. Of these categories, some store application bundles may be +*preinstalled*. + + + +The ***platform*** is comprised of all the facilities used to boot up +the device and perform basic system checks and restorations. It also +includes the infrastructural services on which the applications rely, +such as the session manager, window manager, message bus and +configuration storage service, and the software libraries shared between +components. + +***Built-in application bundles*** are components that have a structure +analogous to that of an application bundle from the application store, +but can only be upgraded as part of an operating system upgrade, not +separately. This should include all software laid on top of the platform +that is on the critical path of user-facing basic functionality, and +hence cannot be removed or upgraded except by installing a new operating +system; this might include basic software such as the browser, email +reader and various settings management applications. + +The platform and built-in applications combine to make up ***essential +software***: the bare minimum Apertis will always have installed. +Essential software has strict requirements both in terms of reliability +and security. + +***Store application bundles*** are application bundles developed by +third-parties to be used as add-ons to the system: they are not part of +the system image and are made available for installation through the +application store instead. While they may be important to the user, +their presence is not required to operate the device properly. + +It is important to note that store application bundles can be shipped +pre-installed on the device, which provides OEMs with a flexible way of +providing differentiation or a more complete user experience by default. + +### Pre-installed Applications + +On most software platforms there are two kinds of applications that come +pre-installed on the device: what we call built-in application bundles +and regular store application bundles. The difference between built-in +application bundles and regular store application bundles that just +happen to come pre-installed is essentially that the former are +considered part of the system's basic functionality, are updated along +with the system and cannot be removed. + +Taking Apple's iPad as an example, we can see that approach being +applied: Safari, Weather, Mail, Camera and so on are built into the +system. + +> See <http://www.apple.com/ipad/built-in-apps/> for a list + +They cannot be removed and they are updated through system +updates. Apple doesn't seem to include any store applications +pre-installed, though. + +The Android approach is very similar: applications such as the browser +are not removable and are updated with the system, but it's much more +common to have store applications be pre-installed, including Google +applications such as GMail, Google Maps, and so on. + +The reason why browsers, mail readers, contacts applications are +built-in software that come with the system is they are considered +integral parts of the core user experience. If one of these applications +were to be removed the user would not be able to utilize the device at +all or would have a lot of trouble doing so: listening to music, +browsing the web and reading email are basic expectations for any mobile +consumer device. + +A second reason which is also important is that these applications often +provide basic services for other applications to call upon. The classic +example here is the contacts application that manages contacts used by +text messaging, instant Internet messaging, email, and several other use +cases. + +#### Case Study: a navigation application, how would it work? + +The navigation application was singled out as a case that has +requirements and features that intersect those of built-in applications +and those of store applications. On the one hand, the navigation +application is core functionality, which means it should be part of the +system. On the other hand, it should be possible to make the application +extensible or upgradable, enabling the selling of updated maps, for +instance. + +Collabora believes that the best way to solve this duality is to +separate program and data, and to follow the lead of other platforms and +their app stores in providing support for in-app purchases. This +functionality is used often by games to provide additional characters, +scenarios, weapons and such, but also used by applications to provide +content for consumption through the application, such as magazine issues +and also maps. + +For such a feature to work, it needs to be provided as an API that +applications can use to talk to the app store to place orders and to +verify which data sets the user should be allowed to download. The +actual data should be hosted at the app store for downloading +post-validation. The disposition of the data, such as whether it should +be made available as a single file or several, whether the file or files +are compressed or not, should be left for the application author to +decide on based on what makes more sense for the application. + +### Responsibilities of the Application Store + +The application store will be responsible for packaging a developer's +store application bundle into a bundle file along with a “store +directoryâ€(see [][Store directory]) that contains some generated +metadata and a signature. Special “SDK†system images will provide +software development tools and allow the installation of unsigned +packages, but the normal “target†system image will not allow the +installation of packages that don't contain a valid store signature. + +The owner of the store, via the signing authority of the application +store, will have the ability to accept or reject any application to be +run on Apertis. By disallowing any form of “self publication†by +application developers, the store owner can ensure a consistent look and +feel across all applications, screen applications for malicious +behavior, and enforce rigorous quality standards. + +However, pre-publication screening of applications will represent a +significant time commitment, as even minor changes to applications must +undergo thorough testing. High priority security fixes from developers +may need to be given a higher priority for review and publication, and +the priority of application updates may need to be considered +individually. System updates will correspond to the busiest periods for +both internal and external developers, and the application store will +experience significant pressure at these times. + +In order for the the application update system to work properly, each +new release of an application needs to have a version number greater +than the previous release. The store may need to make changes to +application metadata between the developer's releases of that +application. To allow the store to increment the application version +without interfering with the developer's version numbers, the store will +maintain a “store version†number to be appended to the developer's +version number. The store version will start at 1 and be reset to 1 any +time the developer increases the application version. + +As an example, if a developer releases an application with a version of +2.5 for publication, the store will release this under the version +2.5-1. + +> This approach closely resembles the versioning scheme used in dpkg +> and rpm packages, which combine an “upstream version†with a +> “packaging revision†+ +If the store ever needs to push an update to this +application without waiting for the developer to create a new version, +then the store version can be incremented from 1 to 2, and version 2.5-2 +can be released without any intervention from the developer. This is +expected to be an uncommon occurrence, but may be done to correct +packaging problems, or even to disable an application if it's found to +have critical security flaws and the developer isn't responsive. + +### Identifying applications + +During the design of other Apertis components, it has become clear that +several areas of the system design would benefit from a consistent way +to identify and label application bundles and programs. In particular, +the ability to provide a security boundary where inter-process +communication is used relies on being able to identify the peer, in a +way that ensures it cannot be impersonated. + +An application has several strings that might reasonably act as its +machine-readable name in the system: + + - the name of the application bundle, being the + [Flatpak `app-id`](http://docs.flatpak.org/en/latest/using-flatpak.html#identifiers) + or the name discussed in [Application bundle metadata] + + - the D-Bus well-known name or names taken by the program(s) in the + bundle, for instance via GLib's GApplication interface + + - the name of the AppArmor profile attached to the program(s) in the + bundle, if they have them + + - the name(s) of the freedesktop.org .desktop file(s) associated with + the program(s), if they have them + + - the name of the systemd user service (.service file) associated with + the program(s), if they have them + +We propose to align all of these as follows, matching the approach +used by [Flatpak for its application identifiers](http://docs.flatpak.org/en/latest/using-flatpak.html#identifiers): + + - The *bundle ID* is a case-sensitive string matching the syntactic + rules for a D-Bus interface name, i.e. two or more components + separated by dots, with each component being a traditional C + identifier (one or more ASCII letters, digits, or underscores, + starting with a non-digit). + + This scheme makes every bundle ID a valid D-Bus well-known name, + but excludes certain D-Bus well-known names (those containing the + hyphen/minus). This allows hyphen/minus to be used in filenames + without ambiguity, and facilitates the common convention in which a + D-Bus service's main interface has the same name as its well-known + name. + + - Application authors should be strongly encouraged to use a DNS name + that they control, with its components reversed (and adjusted to + follow the syntactic rules if necessary), as the initial components + of the bundle ID. For instance, the owners of collabora.com and + 7-zip.org might choose to publish `com.collabora.MyUtility` and + org._7_zip.Decompressor, respectively. This convention originated + in the Java world and is also used for Android application packages, + Tizen applications, D-Bus names, GNOME applications and so on. + + - Application-specific filenames on disk should be based on the bundle + name. For instance, `com.collabora.MyUtility` might have its program, + libraries and data in appropriate subdirectories of + `/Applications/com.collabora.MyUtility/`. Built-in applications should + also use the bundle ID; for instance, the Frampton executable + might be `/usr/Applications/org.apertis.Frampton/bin/frampton`. + + - App-store curators should not allow the publication of a bundle + whose name is a prefix of a bundle by a different developer, or a + bundle that is in the essential software set. App-store curators do + not necessarily need to verify domain name ownership in advance, but + if a dispute arises, the app-store curator should resolve it in + favour of the owner of the relevant domain name. + + - Well-known namespaces used by platform components (such as + `apertis.org`, `freedesktop.org`, `gnome.org`, `gtk.org`) should be + restricted to app bundles associated with the relevant projects. + Example projects provided in SDK documentation should use the names + that are reserved for examples (see [RFC2606](<http://www.rfc-editor.org/info/rfc2606>)), + such as example.com, but app-store curators should not publish bundles that use such names. + + - Programs in a bundle may use the D-Bus well-known name corresponding + to the bundle ID, or any D-Bus well-known name for which the + bundle ID is a prefix. For instance, the `org.apertis.Frampton` + bundle could include programs that take the bus names + `org.apertis.Frampton`, `org.apertis.Frampton.UI` and/or + `org.apertis.Frampton.Agent`. + + - Programs in a bundle all use the same AppArmor profile. As a result + of the convention that AppArmor profile names are equal to on-disk + filenames, its name must start with the installation location based on + the bundle ID. + + Further, to allow upgrade and rollback to be carried out without making + the system insecure, we currently require that every store app-bundle's + AppArmor profile is deterministically derived from the bundle ID, by + being exactly `/Applications/${bundle_id}/**` where `${bundle_id}` + represents the bundle ID. + (See [][AppArmor profiles] for rationale for this choice.) + + For instance, all programs in the + `org.apertis.Frampton` built-in app-bundle would run under a profile + whose name starts with `/usr/Applications/org.apertis.Frampton/`, + and all programs in the `com.example.ShoppingList` store app-bundle + would run under a profile named + `/Applications/com.example.ShoppingList/**`. + + - If a program is a systemd user or system service, the service file + should be the program's D-Bus well-known name followed by `.service`, + for example `org.apertis.Frampton.Agent.service`. Similarly, if a + program has a freedesktop.org .desktop file, its name should be the + program's D-Bus well-known name followed by `.desktop`, for example + `org.apertis.Frampton.UI.desktop`. + +In particular, using the bundle ID in the AppArmor profile name makes +it trivial for a D-Bus service to identify the application bundle to +which a peer belongs: + + - the service can learn the AppArmor profile name via the standard + `GetConnectionCredentials` D-Bus method call + + - if the profile starts with `/Applications/`, followed by a + syntactically valid bundle ID, followed by either end-of-string or `/`, + then the peer is a store app-bundle with the bundle ID that appears + after the second `/` + + - if the profile starts with `/usr/Applications/`, followed by a + syntactically valid bundle ID, followed by either end-of-string or `/`, + then the peer is a built-in app-bundle with the bundle ID that appears + after the third `/` + + - if the profile starts with one of the well-known executable + directories for the platform (`/usr/`, `/bin/`, `/lib/` etc.) + and does not start with `/usr/Applications`, or the profile has the + special value `unconfined` indicating the absence of AppArmor + confinement, then the peer is a platform component + + - otherwise, the peer is in an unknown category and must not be given + any special privileges + +The same approach can be used across any other IPC channel on which a +process can securely query the peer's LSM (Linux Security Module) +context, such as Unix sockets or kdbus. + +A library available to platform services should provide a recommended +implementation of this algorithm. + +> This was implemented in `libcanterbury-platform`. + +### Application Releasing Process + +Once application testing is complete and an application is ready to be +distributed, the application releasing process should contain at least +the following steps: + + - Verify that the application's bundle ID does not collide with any + bundle by a different publisher (in the sense that neither is a + prefix of the other). + + - Generate an AppArmor profile for the application based on its + [permissions] + + - Generate the application's [][Store directory]. + + - Make the application available at the store. + +### Application Installation Tracking + +The System Updates and Rollback design describes a method of migrating +settings and data from an existing Apertis system to another one. To +work properly, the application store would need to have a list of +applications installed on a specific Apertis device. + +If the application store keeps a database of vehicle IDs and the +applications purchased for them, this will help in order to facilitate +software updates and to simplify software re-installation after a system +wipe. + +The application store can only know which applications have been +downloaded for use in a specific vehicle – with no guarantee of a +persistent Internet connection, the store has no way to know whether the +application has really been installed or subsequently uninstalled. The +store also can't reliably track what version of an application is +installed. + +If an application is downloaded on a computer with a web browser +(presumably for installation via external media), the store shouldn't +assume it was actually installed anywhere. Only applications installed +directly to the device should be logged as installed. When the user logs +in to the store (or the device logs into the store with the users +credentials to check for updates), the list of installed packages can be +synchronized. + +If an application is installed from a USB storage device the application +manager could write a synchronization file back to the device that could +subsequently be uploaded back to the application store from a web +browser. Care should be taken to ensure these files can't be used by +malicious users to steal applications – the store should check that the +applications listed in the synchronization file have been legitimately +purchased by the user and the file's contents should be discarded if +they have not. + +To perform a migration for a device that hasn't had a consistent +Internet connection, the device could be logged into the store to +synchronize its application list prior to beginning the migration +process. + +### Digital Rights Management + +Details of how DRM is to be used in Apertis are not finalized yet, but +some options are presented here. + +The store is in a convenient position to enforce access control methods +for applications. When an application is purchased, the application +store can generate the downloadable bundle with installation criteria +built in. + +The installation could be locked in the following ways: + + - Locked to a specific vehicle ID – it will only install on a specific + vehicle. The Apertis unit will refuse to install the application if + the vehicle ID does not match the ID embedded in the downloaded + application package. + + - Locked to a specific device ID – it will only install on a specific + Apertis unit. + + - Locked to a customer ID – It will only install for a specific + person, as represented by their store account - presumably a store + account must be present and logged in for this to work. The store + account is assumed to be analogous to an Apple Store or Google Play + account: as noted in [][Terminology], we recommend + avoiding the term “user†here, since a store account does not + necessarily correspond 1:1 to the “users†discussed in the Multiuser + design document. + +Any “and†combination of these 3 locks could also be used. For example, +an application bundle may only be installable to a specific device in a +specific vehicle (in other words, locked to vehicle ID and device ID) – +if the Apertis unit is placed in another vehicle, or the vehicle's +Apertis unit is replaced, the application bundle would not be +installable. + +Conversely, rights could also be combined with the “or†operator, such +as allowing an application bundle to be installed if either the correct +Apertis unit is used, or the correct vehicle. Collabora recommends these +combinations not be implemented. Most of the combinations provided by +“or†aren't obviously useful. + +It might also be useful to distribute some packages in an unlocked form +– free software, ad sponsored software, or demo software may not +require any locking at all. Ultimately, this is a policy decision, not a +technical one, as they could just as easily be locked to the +downloader's account. + +Note that these are all install time checks, and if a device is moved to +another vehicle after successfully installing a bundle, it may result in +running an app somewhere that an application developer or OEM didn't +intend it to be run. In order to prevent this from happening, it would +be more reliable to do launch-time testing of the applications. + +The store would generate a file to be bundled with the application that +listed the launch criteria, and the application manager would check +those criteria before launching the application for use. + +It should be considered that launch time testing would require a user to +be logged in to the store in some way if the applications are to be +keyed to a store account. This would make it impossible to launch +certain applications when Apertis is without network connectivity, and +could be a source of frustration for end users. + +### Permissions + +Applications can perform many functions on a variety of user data. They +may access interfaces that read data (such as contacts, network state, +or the users location), write data, or perform actions that can cost the +user money (like sending SMS). As an example, the Android operating +system has a comprehensive [manifest][Android-manifest] +that govern access to a wide array of functionality. + +Some users may wish to have fine grained control over which applications +have access to specific device capabilities, and even those that don't +should likely be informed when an application has access to their data +and services. + +See the [Permissions concept design] for further details. + +Application developers will declare the permissions their application +depends on in [application bundle metadata], and Apertis +will allow a user to approve a subset of an application's required +permissions. + +There are some difficulties in allowing users to accept only some of the +permissions an application developer expected their software to have +access to: + + - Some of the permissions will be controlled by an AppArmor profile + generated by the application store. The user is merely accepting the + profile, actually changing it would not be trivial. + + - AppArmor profiles are per-application, not per user. AppArmor + profiles would need to be changed on user switch if different users + required different permission configurations for the same + applications. + + - A huge testing burden is placed on the application developer if they + can't rely on the requested permissions. They must test their + applications in all possible configurations. + + - The permissions may be required for the application developer's + business model – be that network permissions for displaying + advertising, or GPS information for crowd sourcing traffic + information. Allowing the user to restrict permissions in these + situations would be unfair to the developer. + +To mitigate some of these problems, there must be two kinds of +permissions: required and optional. Required permissions are those that +can't be removed from an application – such as anything granted by the +AppArmor profile. If a user chooses to deny a required permission, an +application can not run. + +Optional permissions are handled by higher level APIs in the SDK and may +be influenced by system settings. Apertis-specific “wrapper†services +that abstract the functionality of lower level libraries can provide +access controls. These wrapper services would act based on the +individual user's settings and preferences, so each user would have +control over the applications they use. Because these services must act +as a trust boundary between apps within the scope of a particular user's +account, a privilege boundary must be imposed between the app bundle and +the wrapper service: to provide this boundary, they must be implemented +as a separate service process, rather than merely a library that is +loaded by the application program. + +Some permissions may prove to be more of an annoyance than helpful to +the user. For example, if [][Start] are employed by vast numbers +of applications, users may not wish to be informed every time a new +application requires one. It may be worth considering having some +permission acceptance governed by system settings, and only directly +query the user if a permission is “important†(such as sending SMS). + +## Data Storage + +Applications will have access to several types of writable +application storage. +Care should be taken to select the appropriate area as different areas +are handled differently if rollbacks (See [][Roll-back]) occur. The +storage types are: + + - Application User – for settings and any other per-user files. In the + event of an application rollback, files in this area are rolled back + with their associated application. + + - Application Everyone – for data that is rolled back with an + application but isn't tied to a user account – such as voice samples + or map data. + + - Cache – for easily recreated data. If the system is low on storage + space, it may reclaim cache space for applications that aren't + currently running. Caches will be cleared on update and rollback + instead of being stored by the rollback system. + + - Shared – This area's contents are not touched by the application + management framework for any reason. They are also not subject to + any form of rollback. This area is intended for storage of videos, + music, photos and other data in standard formats that aren't tied to + a single application, analogous to `/sdcard` on Android devices. Since + Apertis space may be limited, and since it is thought that users + will usually want to share media between accounts, the data in this + storage area will be accessible by all users. More details on this + are available in the [](multiuser.md) design. + +The [XDG Base Directory Specification] environment variables will +guide applications to find the appropriate locations for the different +storage types. + +### Extending Storage Capabilities + +It may be desirable for some Apertis devices to allow the user to +install an SD card to increase storage capacity. Since SD cards are +removable – possibly even at runtime – they present some problems that +need to be addressed: + + - Allowing applications to be run from SD cards makes it more + difficult to prevent software piracy. + + - An SD card should be properly unmounted by the system before being + physically removed from the device. + +It is recommended that SD card storage not be used for the installation +of applications or any manner of system software, as this could give +users a way to run untrusted code, or tamper with application settings +or data in ways the developers haven't anticipated. Media files are +obvious candidates for placement on this type of removable storage, as +they don't provide key system functionality, and are not trusted data. + +If it is critical that applications (or other trusted data, such as +navigation maps) be run off of removable storage, allowing the system to +“format†the device before use, deleting all data already on the card +and replacing it with an encrypted BTRFS filesystem would allow a secure +method of placing application storage on the device. + +The dm-crypt [LUKS] system would be used to encrypt the storage +device using a random binary key file. These key files would be +generated at the time the external storage device is formatted and +stored along with the device serial number. One way to generate a key +file would be to read an appropriate number of bytes (such as 32) from +`/dev/random`. + +The key store will be in a directory in the var subvolume (but not in +`/var/lib`) as the var subvolume is not tracked by system rollbacks. If +the key files were in a volume subject to rollbacks, they would +disappear and render external storage unreadable after a system rollback +that crossed their creation date. + +It is imperative that the key store not be accessible to a user as it +would allow them to directly access their removable storage device on +another computer and potentially copy and distribute applications. + +The device could be recognized by its label as reported +by the blkid command, and added to the startup application scan in +[][Boot time procedures]. + +If this is extended to multiple SD cards, difficulties arise in deciding +which storage device to install an application to. Either configuration +options will need to be added to control this, or the device with the +greatest free space at the time of installation can be selected. + +Many embedded devices require some manner of disassembly to remove an SD +card, preventing the user from removing it while the system is in +operation (such as a mobile phone that hides the SD card behind the +battery). If an approach such as this is used, there is no need for +special “eject†procedures for the SD storage. If this is not possible +however, some manner of interface will need to be provided so the user +can safely unmount the SD card before removal. + +If it's physically possible for a user to remove the SD card while the +system is running, the operating system and applications may be exposed +to difficult to recover from situations and poorly tested code paths. +These sorts of SD card sockets should probably not be used for cards +using the BTRFS filesystem. Instead, the better tested FAT-32 filesystem +should be used. + +## Application Management + +Applications will be distributed by the application store as +compressed “application bundles†+containing programs and services that can be launched in a +variety of ways – with the limitation that an application bundle can't +contain more than one program launchable from the application launcher, +or more than one agent. + +> All communication with the application store will take place over a +> secure HTTPS connection. + +The metadata in this bundle provides +information about the application such as it's user friendly name, +services it needs from the system (such as querying the GPS), +permissions it needs from the user, and the versions of system APIs it +depends on. + +An application bundle may provide back-ends to existing system +functionality and add new features to installed software without +necessarily adding any new applications to the application manager. +These are called “system extensions†and are detailed in [][System extensions]. + +### Store Applications + +#### Acquisition + +Applications will be made available through the application store as +compressed files. + +Since Apertis may have limited or no Internet connectivity, it must be +possible to download an application elsewhere and install it from a USB +storage device. Even if Internet connectivity is available the download +process must be reliable – it must be possible to resume a partially +completed application download if the connection is broken or Apertis is +shut down before the download completes. + +A background download service will be provided by Apertis (See +[][Reliable download service]). This service will continue downloads if they +are interrupted or if the system is restarted. When the download is +completed, or if the download is incapable of being completed, a +callback will be made to the requester via D-Bus. + +The user interface components will be able to query status from the +download service in order to display status about the installation – +including a completion percentage or a position in the download queue. + +#### Installation + +If an application is being installed directly from the store, the +application bundle metadata will be downloaded and the user allowed to select +a subset of [permissions] to allow, or cancel the installation. If the +installation is to proceed, an icon to be displayed in the launcher +while the download and installation takes place will now be acquired +from the application store. + +If the application is being provided from a USB device, the application +bundle metadata and icon are extracted from the application bundle. + +Displaying an accurate progress indicator while installing an +application is non-trivial. One simple option is to include the full +decompressed size of the application in its metadata and +send an update to the user interface occasionally based on the amount +of bytes written. + +This assumes that “number of bytes left to install†directly correlates +to “amount of time left to completionâ€, and suffers from a couple of +common problems: + + - Eventually storage caches are filled and begin writing out causing a + dramatic slowdown in apparent installation speed for larger + applications. + + - Decompression speed may vary for different parts of the same + archive. + +However, users are unlikely to notice even moderate inaccuracies in an +installation percentage indicator, so this may be adequate without +requiring complicated development that may not solve these problems +anyway. + +#### Upgrades + +If configured with a suitable Internet connection, the system will +periodically check whether upgrades are available for any store +applications that have been installed. Apertis will provide its vehicle +ID to the application store and the application store will reply with a +list of the most recent versions of the applications authorized for the +vehicle. If Apertis has had software installed or removed without an +Internet connection, the list of installed applications will be +synchronized with the store at this time. + +Some users may voice concerns over the store's tracking of all the +installed packages on their Apertis. It may be worth mentioning in a +“privacy policy†exactly what the data will be used for. + +If no Internet connection is available, the user can still supply a +newer version of an application on a USB device to start an upgrade. +They can acquire application bundles from the store web page, which will +provide the latest version of applications for download. Old application +versions will not be available through the store. + +Since the application store attempts to track installed applications, it +could notify a user by e-mail when updates are available, or show a list +of updated application when the user logs in to the store. + +In order to allow application rollbacks to rollback the user data +associated with an application, all running instances of an application +will have to be closed prior to starting an upgrade for data coherency +reasons. The user will be unable to launch an instance of the +application during the upgrade process. The system won't recognize +services, handlers, or launchable components of the application until +the final phase of installation is complete. + +#### Removal + +When a user removes the application, any personal settings and caches +required by the application will be automatically removed along with any +user specific data stored for rollback purposes – files the application +has stored in general storage will be left behind. + +Removing a third-party music player shouldn't delete the user's music +collection, but it should delete any configuration information specific +to that player. For this to work properly, application developers need +to be careful to store data in the appropriate locations. + +#### Roll-back + +Apertis may have a per-application rollback system that allows an end +user to revert to the last installed version of an application (that is, +a single previously installed version will be kept when an upgrade is +performed), with all their settings and data in exactly the state it had +been the last time they used it. + +This rollback paradigm has some interesting quirks: + + - If a user rolls back an application, all other users of that + application will also be rolled back. This allows one user to delete + some of another user's settings and personal data. + + - As some software updates may contain critical security fixes, an + ever growing blacklist will have to be maintained to prevent a user + from rolling back to potentially dangerous versions. + + - Developers will have no control over what software versions their + customers are using, making long term support very difficult. They + may receive bug reports for bugs already fixed in newer versions of + the software. + + - Old versions of applications may break if they interact with online + services that changed their protocols, or if Apertis APIs are + deprecated. + + - Application developers have to think very carefully about what data + goes into application storage (and is subject to rollbacks) and + general storage (which isn't). In reality, application developers + will likely pay very little attention to this distinction and the + application store will carry this burden. + + - The effect of a system rollback on installed applications is + unclear. If an application has been upgraded twice since the last + system update and a full system rollback occurs it is possible for + applications to have no launchable version installed. + + - In some cases an application rollback may not even be possible if + the old version of the application is not capable of running on the + current version of the system. + + - Settings management tools like GSettings directly manipulate + application setting data and don't currently support the rollback + system. + +The settings problems can be mitigated by using the persistence API from +the SDK when writing applications, allowing Apertis to hide the +complexity from the application developer. Each application should have +its own database of settings instead of using a single system-wide +database. + +After application rollback, launching the application now will use the +previously installed version +with all settings and private data in the state they were before the +upgrade. + +### System Extensions + +In the context of Apertis, system extensions may refer to themes and +skins which provide global user interface changes, or plug-ins for +existing frameworks that aren't intended for extension by regular +application developers. + +Generally speaking, these will be purchasable add-ons that don't fit +into the category of “applicationâ€, and are instead additions to basic +system functionality. Examples include downloadable content that +radically changes the visual appearance of all applications under +Apertis, or a plug-in that integrates Skype with the contacts and +communications software. + +While these don't fit the standard role of an application, they are +still made available as bundles through the application store, and their +installation is still handled by the application manager. + +The application bundle metadata will have a list of known extension “typesâ€, +and extension components inside an application bundle +will be handled differently based on the specified type. There is no +comprehensive list of extension types, but “Telepathy connection +manager†and “theme†will be the commonly used examples in this +document. + +System extensions, being outside the realm of regular application +developers, are entitled to make assumptions about available libraries +and frameworks that applications are not. This makes rolling them back +independently complicated, and some simplifications are made by +disallowing manual rollback of extensions, and only rolling them back +automatically with a system rollback. + +#### Installation + +There will be no difference between an application bundle or a “system +extension bundle†- and it may even be desirable to deploy an application +with supporting system extensions from the same bundle. + +Most of the process for installing a bundle with system extensions will +be no different than the usual application installation process. +However, the “application specific metadata†configuration will include +exporting files in the system extension directories. + +Depending on the extension, +the newly installed extension may not be functional until daemons are +restarted, or programs rescan plug-in directories. + +Determining what needs to be restarted can be difficult, and could be +different depending on what other system extensions have been installed. +For simple add-ons like themes, or Telepathy connection managers, no +restarts or re-loads should be required, so no special effort needs to +be made. + +For more invasive system extensions, the application manager can decide +based on the extension type in the application bundle metadata whether the +--- +functionality requires that the system be restarted. The user should be +informed during installation that new features will only be present next +time they start their vehicle. + +#### Upgrades + +There may be additional steps required based on the extension type – for +example, if a theme is being upgraded, the application manager should +check if it is the theme currently being used to render GUI elements. If +it is, the system may need to switch to a default theme before the +upgrade begins, and switch back after the upgrade finishes. + +Apart from any extension type specific steps performed by the +application manager, the upgrade process will be exactly as described in +[][Upgrades]. + +#### Removal + +Once again, the process only deviates from [][Removal] by +performing any specific actions required by the extension type before +following the standard procedure. + +As an example, if the extension is a theme, the system should ensure it +is not currently in use before beginning the usual removal process. + +#### Rollback + +Like regular applications, a system rollback will automatically rollback +system extension components. + +An intentional rollback will only need special steps at the start of the +process, dependent on the type of extension being handled. + +Since system extensions are likely to be low level components, it may be +a good idea to disallow rolling them back in order to ensure important +bug fixes can be deployed. + +### License Agreements + +> Collabora does not have legal expertise in these matters, and any +> authoritative information – especially if financial damages may be +> involved – should be supplied by the appropriate legal +> advisers. + + +Each application may have its own license agreements, privacy policies, +or other stipulations a user must accept before they can use the +application. Different OEMs may have different requirements, and the +legal requirements governing the contents of these documents may vary +from country to country. + +Such licenses generally disclose information regarding the use of data +collected by an application or related services, define acceptable usage +of the application or services by a user, or discuss the warranty and +culpability of the application provider. + +Regardless of content, Apertis should make all reasonable efforts to +ensure a user has agreed to the appropriate agreements before they may +use an application. The first step to accomplishing this goal is to +require a user accept the license agreement before downloading an +application from the store. + +As this only requires a single user to accept the agreement, and does +nothing for built-in applications, it is an incomplete solution. +Requiring acceptance of the license terms when an application is +installed, or when it is enabled for a user's account, would increase +the likelihood that a user has agreed to the appropriate license. + +If license terms change between releases, it might be advisable to ask +users to accept the license terms on the first launch after an +application update or rollback as well. + +Ultimately, there is no guarantee that the person using a Apertis +account is the person that agreed to an application's license. + +Some licenses, such as the GPL, inform the user of their rights to +obtain a copy of the source code of the software. Licenses like this +should be made available to the user, but don't necessarily need to be +displayed to the user unless the user explicitly requests the +information. + +## Application Run Time Life-Cycle + +The middleware will assist UI components in launching and managing +applications on Apertis. Application bundles can provide executable +files (programs) to be launched via different mechanisms, some of them +user visible (perhaps as icons in the application manager that will +launch a graphical program), and some of them implicit (like a +connection manager for the Telepathy framework, or a graphical program +that does not appear in menus but is launched in order to handle a +particular request). + +On a traditional Linux desktop, a graphical program doesn't generally +make a distinction between foreground and background operation, though +it may watch for certain events (input focus, window occlusion) that +could be used to monitor that status. Some mobile operating systems +(Android, iOS) hide the details of background operation from the user, +some (WebOS, Meego) allow the user to interact with background +applications more directly. + +The approach will be similar to that of Android and iOS – whether an +application (graphical program) is actually running is hidden from the +user. The user may either launch new applications or press the “back†+button to return the last running application. + +From the user's perspective, applications will be persistent. When a +user comes back to an application they've previously used, it will be in +the same state they left it – even if the vehicle has been turned off +and restarted. + +### Start + +There are multiple ways in which a program associated with an +application bundle, whether graphical or not, can be started by Apertis: + + - Direct launch – application bundles may contain an image or widget + to be displayed in the application launcher, which will launch a + suitable graphical program. The name and icon shown in the application + launcher is part of the [entry point metadata][Application entry points]. + + - By data type association - The content-type (MIME type) of data is + used to select the appropriate application to handle the request. + Applications will provide a list of content-types (if any) that they + handle in the [entry point metadata][Application entry points]; + activating the application with the + corresponding content type will launch the corresponding graphical + program. + + - Sharing back-ends – Applications may define sharing capabilities + that allow other applications to launch them and send a + receiver-limited amount of data. Again, activating an application in + this way would normally start a corresponding graphical program. + + - Agents – persistent non GUI processes that provide a background + component for applications. These will be launched automatically at + boot time or immediately after application installation. An + application bundle will contain at most a single agent, and the + [permissions] will include an “agent†permission to allow users to know + they're installing an application that uses one. + +We refer to the programs that are launched in these ways as *entry points*. + +Collabora feels that the first three types of launch should be under the +control of the application manager. See +[][Responsibilities of the application manager] + +Another method of launching processes is present – D-Bus activation. If +a D-Bus client attempts to use a known-name for a service that isn't +currently running, D-Bus will search its configuration files for an +appropriate handler to launch. This sort of activation is more useful +for system level developers, and won't be used to launch graphical +programs. + +During pre-publication review by the app store, careful attention should +be paid to application bundles that wish to use agents, and the resource +consumption of the agents. The concept does not scale – it creates a +system where the number of installed application bundles can +dramatically affect run-time performance as well as system boot-up time. + +When a program in an application bundle is started by the application +manager, in certain circumstances the manager will need to take extra +steps. For the first launch of a built-in application, or the first +launch after one has been updated, a subvolume will need to be created +for storing user data. + +### Background Operation + +More than one graphical program may be running at the same time, but the +user can only directly interact with a limited number of graphical +programs at any instant. For example, 1/3 of the screen may be giving +driving directions while the other 2/3 of the screen displays an e-mail +application. Concurrently, in the background, a music player may be +running while several RSS feed readers are periodically updating. + +Background tasks may also be performed by long running agents. Agents +run for the duration of the user's session, and are only terminated if +the system needs to unmount an application's subvolume - either to shut +down the system or to upgrade or uninstall the application. + +Graphical programs will be notified with a D-Bus message when they lose +focus and are relegated to background status – the response to this +notification is application dependent. If it has no need to perform +processing in the background, It may save its current state and +self-terminate, or it may remain idle until re-focused. Some graphical +programs will continue to operate in the background – for example, a +navigation application might remain active in the background and +continue to give turn-by-turn instructions. + +Graphical programs that need to perform tasks in the background will +have to set the “background†permission in their [permissions]. Ideally they +should be designed with a split between foreground and background +components (perhaps using a graphical program for the user interface and +an agent for the background part) instead. + +If a background graphical program wishes to be focused, it can use the +standard method for requesting that a window be brought to the +foreground. + +### End + +Applications written for Apertis have persistent state, so from a user's +perspective they never end. Apertis still needs to be able to terminate +applications to manage resources, perform user switching, or prepare for +shutdown. + +Programs – either graphical or not – may be sent a signal by the +middleware at any time requesting that they save their state and exit. +Even if the application bundle has the background permission, its +processes may still be signaled to save its state in the case of a +system shut-down or a user switch. + +To prevent an application that doesn't respond to the state saving +request from delaying a system shutdown or interfering with the system's +ability to manage memory, processes will be given a limited amount of +time (5 seconds) to save their state before termination. Applications +that don't need to save state should simply exit in response to this +signal. + +It should be noted that state saving is difficult to implement, and much +of the work is the responsibility of the application writer. While +Apertis can provide functions for handling the incoming signal and +storing state data (See +[][State saving convenience APIs]), the +hardest part is determining exactly what application state needs to be +saved in order for the application to exit and restart in exactly the +same way it had been previously running. + +There is no standard Linux API for saving application state. POSIX +defines `SIGSTOP` and `SIGCONT` signals for pausing and resuming programs, +but these signals don't remove applications from memory or provide any +sort of persistence over a system restart. Since they're unblockable by +applications, the application may be interrupted at any time with no +opportunity to do any sort of clean-up. + +However, some applications may react to changes in system state – such +as network connectivity. One method of preventing applications from +reacting to D-Bus messages, system state changes, and other signaling is +to use `SIGSTOP` to halt an application's processing. The application +becomes responsible for handling whatever arises after `SIGCONT` causes it +to resume processing (such as a flood of events or network timeouts). + +Automatically saving the complete state of an application is essentially +impossible - even if the entire memory contents are saved, the +application may have open files, or open connections on remote servers, +or it may have configured hardware like the GPU or a Bluetooth device. + +For a web browser the state might be as simple as a URL and display +position within the page, and the page will be reloaded and redisplayed +when the browser is re-launched. However, if the user was in the middle +of watching a streaming video from a service that requires a log-in, the +amount of information that needs to be retained is larger and has +potential security ramifications. + +It's possible that a viewer application may exit and the file it was +viewing be deleted before the application's next start, making it +impossible to completely restore the previous application state. +Applications will be responsible for handling such situations +gracefully. + +#### State Saving Convenience APIs + +As state saving is a difficult problem for developers, it seems +appropriate for Apertis to provide API to assist in performing the task +accurately. + +A minimal C language API for state saving could be developed consisting +--- + + - A way to register a callback for a D-Bus signal that requests a save + of state information. + + - Functions to atomically serialize data structures to application + storage. + + - Functions to read previously serialized data structures into memory. + + - Functions to clear previously saved state. + + - Documentation and sample code for using the API. + +This API's usage won't be mandatory for application developers. + +If it is intended that users have control over which apps save state and +which merely close on exit, this API could also provide the code to +handle those configuration options. + +Maemo provided, through [libosso] a very simple state saving API. +It expected relevant application state be contained within a single +contiguous memory region, and provided a call that would write out this +single memory area to some abstract storage area that persists +across reboots. On startup, an application would attempt to re-read that +memory, and if no pre-existing state was present, would start over. + +### Frozen + +Interest has been expressed in creating a state with less resource +consumption than background-operation yet still having faster start-up +times than ending the process and saving state. + +This is a very difficult problem to solve without application +intervention – simply dumping the memory contents of a process to long +term storage won't be enough to restore the application. File +descriptors and network connections are tracked by the kernel and would +need to be re-established on process restart. The network connections +are especially problematic as the remote end would be unaware of what +was happening. + +Having applications involved in the process may allow some form of task +suspension that reduces the perceived start-up time of a “frozen†+application. In response to a signal (presumably over D-Bus) from the +application manager, a running application could free easily re-created +data, close down network connections and remain dormant until the +application manager gave it a signal to resume regular operation. + +### Resource Usage + +To make better use of the available memory, it's recommended that +applications listen to the cgroup notification +[memory.usage_in_bytes] +and when it gets close to the limit for applications, start reducing the size of any caches they hold in main +memory. It may be good to do this inside the SDK and provide +applications with a [GLib object][GMemoryMonitor] that will notify them. + +In order to reduce the chances that the system will find itself in a +situation where lack of disk space is problematic, it is recommended +that available disk space is monitored and applications notified so they +can react and modify their behavior accordingly. Applications may chose +to delete unused files, delete or reduce cache files or purge old data +from their databases. + +The recommended mechanism for monitoring available disk space is for a +daemon running in the user session to call `statvfs (2)` periodically on +each mount point and notify applications with a D-Bus signal. Example +code can be found [in the GNOME project][gsd-disk-space] +which uses a similar approach (polling every 60 seconds). + +In case applications cannot be trusted to properly delete non-essential +files, a possibility would be for them to state in their +[application bundle metadata] where +such files will be stored, so the system can delete them when needed. + +In order to make sure that malfunctioning applications cannot cause +disruption by filling filesystems, it would be required that each +application writes to a separate filesystem. + +### Applications not Written for ***Apertis*** + +It may be desirable to run applications (such as Google Earth) that were +not written for Apertis. These applications won't understand any custom +signals or APIs that Apertis provides, providing yet another reason to minimize +those and stick to upstream solutions as much as possible. + +Non-Apertis applications should be treated as if they have the +background permission – they should not be killed unless the system is +extremely low on resources, or they will provide an inconsistent user +experience when they don't save state like a native Apertis application. + +### Responsibilities of the Application Manager + +Application life-cycle is dictated by the application manager. When the +user interacts with an icon or a link, the application manager is +responsible for launching the appropriate application. + +Collabora recommends that the application manager also be responsible +for content-type-based (MIME-type-based) launching. The manager could +provide a D-Bus interface through which it can be asked to launch an +appropriate application for a specific content type. A list of available +handlers and their invocation details would be gathered from the +application entry points. + +If multiple applications are capable of handling the same content-types, +the user may wish to have a way to select which one takes precedence. +One possible solution is to have the application manager provide a +dialog any time an ambiguous content-type launch is required, allowing +the users to choose their preferred handler, with a check-box that can +be set to remember the selection. This is how the Android operating +system handles this situation. + +There are security concerns when allowing content-type-based application +launching – it can potentially lead to an untrusted source (like a web +page) being capable of launching a store application with a known +security bug. + +Giving the application manager control over content-type-based launches +can allow it to restrict the usage of content-type-based launching. The +manager would be able to deny certain applications the ability to launch +handlers for specific data types (perhaps the web browser should never +be allowed to launch a handler for a certain data type), filter the +handlers available to an application to only allow trusted built-in +applications to be used, or allow a system upgrade to blacklist a store +application's handler while waiting for a fix from a third party. + +The application launcher is itself a built-in application, and as such +its storage is governed by system rollbacks. In the event of a system +rollback, all of the launcher's settings (icon placement, for example) +will be automatically reset to the state they were in just prior to the +last system upgrade. + +## References + +This document references the following external sources of information: + + - XDG Base Directory Specification: + [*http://standards.freedesktop.org/basedir-spec-latest.html*](http://standards.freedesktop.org/basedir-spec-latest.html) + + - Apertis System Updates and Rollback design + + - Apertis Multiuser design + + - Apertis Supported API design + + - Apertis Preferences and Persistence design + + - Eastlake 3rd, D. and A. Panitz, "Reserved Top Level DNS Names", BCP + 32, RFC 2606, DOI 10.17487/RFC2606, June 1999 + ([*http://www.rfc-editor.org/info/rfc2606*](http://www.rfc-editor.org/info/rfc2606)) + +<!-- Other Apertis design documents --> + +[Apertis Application Bundle Specification]: https://appdev.apertis.org/documentation/bundle-spec.html +[Application bundle metadata]: application-bundle-metadata.md +[Application entry points]: application-entry-points.md +[Application layout]: application-layout.md +[Permissions concept design]: permissions.md +[Public system extensions]: application-layout.md#public-system-extensions +[Security]: security.md + +<!-- External links --> + +[Android-manifest]: http://developer.android.com/reference/android/Manifest.permission.html + +[deb-triggers]: http://man7.org/linux/man-pages/man5/deb-triggers.5.html + +[Reversed domain name]: https://en.wikipedia.org/wiki/Reverse_domain_name_notation + +[XDG Base Directory Specification]: http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html + +[LUKS]: https://code.google.com/p/cryptsetup/ + +[tar's checkpoint]: http://www.gnu.org/software/tar/manual/html_section/checkpoints.html + +[libosso]: http://maemo.org/api_refs/5.0/5.0-final/libosso/group__Statesave.html + +[memory.usage_in_bytes]: http://www.kernel.org/doc/Documentation/cgroups/memory.txt + +[GMemoryMonitor]: https://gitlab.gnome.org/GNOME/glib/merge_requests/1005 + +[gsd-disk-space]: http://git.gnome.org/browse/gnome-settings-daemon/tree/plugins/housekeeping/gsd-disk-space.c#n693 diff --git a/content/designs/audio-management.md b/content/designs/audio-management.md new file mode 100644 index 0000000000000000000000000000000000000000..b6ef88cdce4369449973a31fc06fb6d8c36aab16 --- /dev/null +++ b/content/designs/audio-management.md @@ -0,0 +1,712 @@ +--- +title: Audio management +short-description: Overview of audio management in Apertis (concept) +authors: + - name: Frederic Dalleau + - name: Emanuele Aina +--- + +# Audio management + +## Introduction + +Apertis audio management was previously built around PulseAudio but with the +move to the Flatpak-based application framework [PipeWire](https://pipewire.org/) +offers a better match for the use-cases below. Compared to PulseAudio, PipeWire +natively supports containerized applications and keeps policy management +separate from the core routing system, making it much easier to tailor for +specific products. + +Applications can use PipeWire through [its native API](https://pipewire.github.io/pipewire/), +as the final layer to access sound features. This does not mean that applications have to deal directly with PipeWire: +applications can still make use of their preferred sound APIs as intermediate +layers for manipulating audio streams, with support being available for the +PulseAudio API, for GStreamer or for the ALSA API. + +In an analogous manner, applications can capture sound for various purposes. +For instance, speech recognition or voice recorder applications may need to +capture input from the microphone. The sound will be captured from PipeWire. +ALSA users can use `pcm_pipewire`. GStreamer users can use `pipewiresrc`. + +## Terminology and concepts + +See also the [Apertis glossary] for background information on terminology. + +### Standalone setup + +A standalone setup is an installation of Apertis which has full control of the +audio driver. Apertis running in a virtual machine is an example of a +standalone setup. + +### Hybrid setup + +A hybrid setup is an installation of Apertis in which the audio driver is not +fully controlled by Apertis. An example of this is when Apertis is running +under an hypervisor or using an external audio router component such as +[GENIVI audio manager]. +In this case, the Apertis system can be referred to as Consumer Electronics +domain (CE), and the other domain can be referred to as Automotive Domain (AD). + +### Different audio sources for each domain + +Among others, a standalone Apertis system can generate the following sounds: +- Application sounds +- Bluetooth sounds, for example music streamed from a phone or voice call sent +from a handsfree car kit +- Any kind of other event sounds, for example somebody using the SDK can +generate event sounds using an appropriate command line + +A hybrid Apertis system can generate the same sounds as a standalone system, +plus some additional sounds not always visible to Apertis. For example, +hardware sources further down the audio pipeline such as: +- FM Radio +- CD Player +- Driver assistance systems + +In this case, some interfaces should be provided to interact with the +additional sound sources. + +### Mixing, corking, ducking + +*Mixing* is the action of playing simultaneously from several sound sources. + +*Corking* is a request from the audio router to pause an application. + +*Ducking* is the action of lowering the volume of a background source, while +mixing it with a foreground source at normal volume. + +## Use cases + +The following section lists examples of usages requiring audio management. +It is not an exhaustive list, unlimited combinations exists. +Discussion points will be highlighted at the end of some use cases. + +### Application developer + +An application developer uses the SDK in a virtual machine to develop an +application. He needs to play sounds. He may also need to record sounds or test +their application on a reference platform. This is a typical standalone setup. + +### Car audio system + +In a car, Apertis is running in a hypervisor sharing the processor with a real +time operating system controlling the car operations. Apertis is only used for +applications and web browsing. A sophisticated Hi-Fi system in installed under +a seat and accessible via a network interface. This is a hybrid setup. + +### Different types of sources + +Some systems classify application sound sources in categories. It's important +to note that no standard exists for those categories. + +Both standalone and hybrid systems can generate different sound categories. + +#### Example 1 + +In one system of interest, sounds are classified as *main sources*, and +*interrupt sources*. +Main sources will generally represent long duration sound sources. The most +common case are media players, but it could be sound sources emanating from +web radio, or games. As a rule of thumb, the following can be used: when +two main sources are playing at the same time, neither is intelligible. Those +will often require an action from the user to start playing, should it be turn +ignition on, press a play button on the steering wheel or the touchscreen. +As a consequence, only policy mechanisms ensure that only one main source can +be heard at a time. + +Interrupt sources will generally represent short duration sound sources, they +are emitted when an unsolicited event occurs. This could be when a message is +received in any application or email service. + +#### Example 2 + +In another system of interest, sounds are classified as *main sources*, *interrupt sources* and *chimes*. +Unlike the first example, in this system, a source is considered a main source +if it is an infinite or loopable +source, which can only be interrupted by another main source such FM radio or +CD player. Interrupt sources are informational sources such as navigation +instructions, and chimes are unsolicited events of short duration. +Each of these sound sources is not necessarily generated by an application. +It could come from a system service instead. + +### Navigation instruction + +While some music from FM Radio is playing, a new navigation instruction has to +be given to the driver: the navigation instructions should be mixed with +the music. + +### Traffic bulletin + +Many audio sources can be paused. For example, a CD player can be paused, as can media files played from local storage (including USB mass storage), and some network media such as Spotify. + +While some music from one of these sources is playing, a new traffic bulletin +is issued: +the music could be paused and the traffic bulletin should be heard. When it is +finished, the music can continue from the point where the playback was paused. + +By their nature, some sound sources cannot be paused. For example, FM or DAB +radio cannot be paused. + +While some music from a FM or DAB radio is playing, a new traffic bulletin is +issued. Because the music cannot be paused, it should be silenced and the +traffic bulletin should be heard. When it is finished, the music can be heard +again. + +Bluetooth can be used when playing a game or watching live TV. As with the +radio use-case, Bluetooth cannot be paused. + +### USB drive + +While some music from the radio is playing, a new USB drive is inserted. If +setting *automatic playback from USB drive* is enabled, +the Radio sound stops and the USB playback begins. + +### Rear sensor sound + +While some music from the radio is playing, the driver selects rear gear, the +rear sensor sound can be heard mixed with the music. + +### Blind spot sensor + +While some music from Bluetooth is playing, a car passes through the driver's +blind spot: a short notification sound can be mixed with the music. + +### Seat belt + +While some music from the CD drive is playing, the passenger removes their seat +belt: a short alarm sound can be heard mixed with the music. + +### Phone call + +While some music from the CD drive is playing, a phone call is received: the +music should be paused to hear the phone ringing and being able to answer the +conversation. In this case, another possibility could be to notify the phone +call using a ring sound, mixed in the music, and then pause the music only if +the call is answered. + +### Resume music + +If music playback has been interrupted by a phone call and the phone call has +ended, music playback can be resumed. + +### VoIP + +The driver wishes to use internet telephony/VoIP without noticing any +difference due to being in a car. + +### Emergency call priority + +While a phone call to emergency services is ongoing, an app-bundle process +attempts to initiate lower-priority audio playback, for example +playing music. The lower-priority audio must not be +heard. The application receives the information that it cannot play. + +### Mute + +The user can press a [mute hard-key]. In this case, and according to +OEM-specific rules, all sources of a specific category could be muted. For +example, all *main* sources could be muted. +The OEM might require that some sources are never muted even if the user +pressed such a hard-key. + +### Audio recording + +Some apps might want to initiate speech recognition. +They need to capture input from a microphone. +<!-- FIXME: Link to the speech recognition design once complete (T889). --> + +### Microphone mute + +If the user presses a "mute microphone" button (sometimes referred to as a +"secrecy" button) during a phone call, the sound coming from the microphone should be muted. +If the user presses this button in an application during a video +conference call, the sound coming from the microphone should be muted. + +### Application crash + +The Internet Radio application is playing music. It encounters a problem and crashes. +The audio manager should know that the application no longer exists. In an hybrid +use case, the other audio routers could be informed that the audio route is +now free. It is then possible to fall back to a default source. + +### Web applications + +Web applications should be able to play a stream or record a stream. + +### Control malicious application + +An application should not be able to use an audio role for which it does not +have permission. For example, a malicious application could try to simulate a +phone call and deliver advertising. + +### Multiple roles + +Some applications can receive both a standard media stream and traffic +information. + +### External audio router + +In order to decide priorities, an external audio router can be involved. +In this case, Apertis would only be providing a subset of the possible audio +streams, and an external audio router could take policy decisions, to which +Apertis could only conform. + +## Non-use-cases + +### Automatic actions on streams + +It is not the purpose of this document to discuss the action taken on a media +when it is preempted by another media. Deciding whether to cork or silence a +stream is a user interface decision. As such it is OEM dependent. + +### Streams' priorities + +The audio management framework defined by this document is intended to provide +mechanism, not policy: it does not impose a particular policy, but instead +provides a mechanism by which OEMs can impose their chosen policies. + +### Multiple independent systems + +Some luxury cars may have multiple IVI touchscreens and/or sound systems, +sometimes referred to as [multi-seat](https://designs.apertis.org/latest/multiuser.html#multi-seat-logind-seats) +(please note that this jargon term comes +from desktop computing, and one of these "seats" does not necessarily +correspond to a space where a passenger could sit). We will assume that each +of these "seats" is a separate container, virtual machine or physical device, +running a distinct instance of the Apertis CE domain. + +## Requirements + +### Standalone operation + +The audio manager must support standalone operation, in which it accesses audio +hardware directly ([][Application developer]). + +### Integrated operation + +The audio manager must support integrated operation, in which it cannot access +the audio hardware directly, but must instead send requests and audio streams +to the hybrid system. ([][Different types of sources], +[][External audio router]). + +### Priority rules + +It must be possible to implement OEM-specific priority rules, in which it is +possible to consider one stream to be higher priority than another. + +When a lower-priority stream is pre-empted by a higher-priority stream, it +must be possible for the OEM-specific rules to choose between at least these +actions: + +* silence the lower-priority stream, with a notification to the application + so that it can pause or otherwise minimise its resource use (corking) +* leave the lower-priority stream playing, possibly with reduced volume (ducking) +* terminate the lower-priority stream altogether + +It must be possible for the audio manager to lose the ability to play audio +(audio resource deallocation). In this situation, the audio manager must notify +the application with a meaningful error. + +When an application attempts to play audio and the audio manager is unable to +allocate a necessary audio resource (for example because a higher-priority +stream is already playing), the audio manager must inform the application using +an appropriate error message. ([][Emergency call priority]) + +### Multiple sound outputs + +The audio manager should be able to route sounds to several sound outputs. +([][Different types of sources]). + +### Remember preempted source + +It should be possible for an audio source that was preempted +to be remembered in order to resume it after interruption. +This is not a necessity for all types of streams. Some OEM-specific code could +select those streams based on their roles. ([][Traffic bulletin], [][Resume music]) + +### Audio recording + +App-bundles must be able to record audio if given appropriate permission. +([][Audio recording]) + +### Latency + +The telephony latency must be as low as possible. The user must be able to +hold a conversation on the phone or in a VoIP application without noticing any +form of latency. ([][VoIP]) + +### Security + +The audio manager should not trust applications for managing audio. If some faulty +or malicious application tries to play or record an audio stream for which +permission wasn't granted, the proposed audio management design should not +allow it. ([][Application crash], [][Control malicious application]) + +### Muting output streams + +During the time an audio source is preempted, the audio framework must notify +the application that is providing it, so that the application can make an +attempt to reduce its resource usage. For example, a DAB radio application +might stop decoding the received DAB data. +([][Mute], [][Traffic bulletin]) + +### Muting input streams + +The audio framework should be able to mute capture streams. During that time, +the audio framework must notify the application that are using it, so that the +application can update user interface and reduce its resource usage. +([][Microphone mute]) + +### Control source activity + +Audio management should be able to set each audio source to the playing, +stopped or paused state based on priority. ([][Resume music]) + +### Per stream priority + +We might want to mix and send multiple streams from one application to the +automotive domain. +An application might want to send different types of alert. For instance, a +new message notification may have higher priority than 'some contact published +a new photo'. +([][Multiple roles]) + +### GStreamer support + +GStreamer should be supported. + +## Approach + +PulseAudio embeds a default audio policy so, for instance, if you plug an +headset on your laptop aux slot, it silences the laptop speakers. PipeWire +has no embedded logic to do that, and relies on something else to control +it, which suites the needs for Apertis better since it also targets special +use-cases that don't really match the desktop ones, and this separation +brings more flexibility. + +[WirePlumber](https://gitlab.freedesktop.org/pipewire/wireplumber) is a service +that provides the policy logic for PipeWire. It's where the default policy like +the one above is implemented, but unlike PulseAudio is explicitly designed to +let people customize it. PipeWire and WirePlumber is what AGL has used to +replace their previous audio manager in their latest Happy Halibut 8.0.0 release. + +The overall approach is to adopt WirePlumber as the reference solution, +but the separation between audio management and audio policy means that +product teams can replace it with a completely different implementation +with ease. + +### Streams metadata in applications + +PipeWire provides the ability to attach metadata to a stream. +The function +[`pw_fill_stream_properties()`](https://pipewire.github.io/pipewire/classpw__pipewire.html#a841dbb7608dc9cdda4a320d33fbbd39a) +is used to attach metadata to a stream during creation. +The current convention in usage is to use a metadata named `media.role`, +which can be set to values describing the nature of the stream, such as +`Movie`, `Music`, `Camera`, `Notification`, and other defined by PipeWire. + +See also [GStreamer support]. + +### Requesting permission to use audio in applications + +Each audio role is associated with a permission. Before an application can start +playback a stream, the audio manager will check whether it has the permission to +do so. See [Identification of applications]. +[Application bundle metadata] describes how to manage the permissions +requested by an application. +The application can also use bundle metadata to store the default role +used by all streams in the application if this is not specified at the stream +level. + +### Audio routing principles + +The request to open an audio route is emitted in two cases: +- when a new stream is created +- before a stream changes state from Paused to Playing (uncork) + +In both cases, before starting playback, the audio manager must check the +priority against the business rules, or request the +appropriate priority to the external audio router. If the authorization is not +granted, the application should stop trying to request the stream and notify +the user that an undesirable event occured. + +If an application stops playback, the audio manager will be informed. It will +in turn notify the external audio router of the new situation, or handle it +according to business rules. + +An application that has playback can be requested to pause by the audio +manager, for example if a higher priority sound must be heard. + +Applications can use the PipeWire event API to subscribe to events. In +particular, applications can be notified about their mute status. +If an event occurs, such as mute or unmute, the callback will be executed. +For example, an application playing media from a source such as a CD or USB +storage would typically respond to the mute event by pausing playback, so that +it can later resume from the same place. An application playing a live source +such as on-air FM radio cannot pause in a way that can later be resumed from +the same place, but would typically respond to the mute event by ceasing to +decode the source, so that it does not waste CPU cycles by decoding audio that +the user will not hear. + +#### Standalone routing module maps streams metadata to priority + +An internal priority module can be written. This module would associate a +priority to all differents streams' metadata. It is loaded statically from the +config file. See [Routing data structure example] for an example of data +structure. + +#### Hybrid routing module maps stream metadata to external audio router calls + +In the hybrid setup, the audio routing functions could be +implemented in a separate module that maps audio events to automotive domain +calls. However this module does not perform the priority checks. Those are +executed in the automotive domain because they can involve a different feature +set. + +### Identification of applications + +Flatpak applications are wrapped in containers and are identified by an unique +app-id which can be used by the policy manager. Such app-id is encoded in the +name of the [transient systemd scope wrapping each application +instance](https://github.com/flatpak/flatpak/wiki/Sandbox#the-current-flatpak-sandbox) +and can be retrieved easily. + +If AppArmor support is added to Flatpak, AppArmor profiles could also be used +to securely identify applications. + +#### Web application support + +Web applications are just like any other application. However, the web engine +JavaScript API +does not provide a way to select the media role. All streams emanating from the +same web application bundle would thus have the same role. Since each web application is +running in its own process, AppArmor can be used to differentiate them. +Web application support for corking depends on the underlying engine. +WebKitGTK+ has the necessary support. +See [changeset 145811](https://trac.webkit.org/changeset/145811). + +### Implementation of priority within streams + +The policy manager should be able to cork streams: when a new stream with +a certain role is started, all other streams within a user defined list of +roles will get corked. + +### Corking streams + +Depending on the audio routing policy, audio streams might be "corked", "ducked" +or simply silenced (moved to a null sink). + +As long as the role is properly +defined, the application developer does not have to worry about what happens to +the stream except corking. Corking is part of PipeWire API and can happen at +any time. Corking *should* be supported by applications. It is even possible +that a stream is corked before being started. + +If an application is not able to cork itself, +the audio manager should enforce corking by muting the stream as soon as +possible. However, this has the side effect that the stream between the corking +request and the effective corking in the application will be lost. A threshold delay +can be implemented to give an application enough time to cork itself. The policy of the +external audio manager must also be considered: if this audio manager has +already closed the audio route when notifying the user, then the data will +already be discarded. If the audio manager synchronously requests pause, then +the application can take appropriate time to shutdown. + +#### Ensuring a process does not overrides its priorities + +Additionally to request a stream to cork, a stream could be muted so any data +still being received would be silenced. + +### GStreamer support + +GStreamer support is straightforward. `pipewiresink` support the `stream-properties` +parameter. This parameter can be used to specify the `media.role`. +The GStreamer pipeline states already changes from `GST_STATE_PLAYING` to +`GST_STATE_PAUSED` when corking is requested. + +### Remembering the previously playing stream + +If a stream was playing and has been preempted, it may be desirable to switch +back to this stream after the higher priority stream is terminated. +To that effect, when a new stream start playing, a pointer to the stream that +was currently playing (or an id) could be stored in a stack. +The termination of a playing stream could restore playback of the previously +suspended stream. + +### Using different sinks + +A specific `media.role` metadata value should be associated to a priority and a +target sink. This allows to implement requirements of a sink per stream +category. For example, one sink for main streams and another sink for interrupt +streams. +The default behavior is to mix together all streams sent to the same sink. + +### Routing data structure example + +The following table document routing data for defining a A-IVI inspired stream +routing. This is an example, and in an OEM variant of Apertis it would be +replaced with the business rules that would fulfill the OEM's requirements + +App-bundle metadata defines whether the application is allowed to use this +audio role, if not defined, the application is not allowed to use the role. +From the role, priorities between stream could be defined as follows: + +In a standalone setup: + +| role | priority | sink | action| +|-----------------------|---------------|---------------|-------| +| music | 0 (lowest) | main_sink | cork | +| phone | 7 (highest) | main_sink | cork | +| ringtone | 7 (highest) | alert_sink | mix | +| customringtone | 7 (highest) | main_sink | cork | +| new_email | 1 | alert_sink | mix | +| traffic_info | 6 | alert_sink | mix | +| gps | 5 | main_sink | duck | + +In a hybrid setup, the priority would be expressed in a data understandable +by the automotive domain. +The action meaning would be only internal to CE domain. +Since the CE domain do not know what is happening in the automotive domain. + +| role | priority | sink | action| +|-----------------------|---------------|---------------|-------| +| music | MAIN_APP1 | main_sink | cork | +| phone | MAIN_APP2 | main_sink | cork | +| ringtone | MAIN_APP3 | alert_sink | mix | +| customringtone | MAIN_APP3 | main_sink | cork | +| new_email | ALERT1 | alert_sink | mix | +| traffic_info | INFO1 | alert_sink | mix | +| gps | INFO2 | main_sink | mix | + +### Testability + +The key point to keep in mind for testing is that several applications can +execute in parallel and use PipeWire APIs (and the library API) concurrently. +The testing should try to replicate this. +However testing possibilities are limited because the testing result depends on +the audio policy. + +#### Application developer testing + +The application developer is requested to implement corking and error path. +Testing those features will depend on the policy in use. + +Having a way to identify the *lowest* and *highest* priority definition in the +policy could be enough for the application developer. +Starting a stream with the lowest priority would not succeed if a stream is +already running. +Starting a stream with the highest priority would cork all running streams. + +The developer may benefit from the possibility to customize the running policy. + +#### Testing the complete design + +Testability of the complete design must be exercised from application level. +It consist of emulating several applications each creating independent +connections with different priorities, and verifying that the interactions +are reliable. +The policy module could be provisionned with a dedicated test policy for which +the results are already known. + +### Requirements + +This design fullfill the following requirements: + +* [Standalone operation] and [Integrated operation] are provided using separate + sets of configuration files. +* [Priority rules] are provided by the policy manager. +* the audio manager library interface is aware of [Multiple sound outputs]. +* [Remember preempted source] can be implemented in the policy manager. +* [Audio recording] will use the same mechanisms. +* [Latency] is implemented by not adding additional audio processing layer. +* [Security] is implemented by relying on the Flatpak containerization, which + could be further hardened by adding AppArmor support. +* [Muting output streams] and [Control source activity] uses PipeWire corking + infrastructure. +* [Per stream priority] uses the PipeWire API. +* [GStreamer support] is provided indirectly thanks to existing + plugins. + +## Open questions + +### Roles + +- Do we need to define roles that the application developer can use? + + It's not possible to guarantee that an OEM's policies will not nullify + an audio role that is included in Apertis. + However, if we do not provide some roles, there is no hope of ever having + an application designed for one system work gracefully on another. + +- Should we define roles for input? + + Probably, yes, speech recognition input could have a + higher priority than phone call input. (Imagine the use case where someone is + taking a call, is not currently talking on the call, and wants to change their + navigation destination: they press the speech recognition hard-key, tell the + navigation system to change destination, then input switches back to the phone + call.) + +- Should we define one or several audio roles not requiring permission for use? + + No, it is explicitely recommended that every audio role requires permission. + An app-store curator from the OEM could still give permission to every + application requesting a role. + +### Policies + +- How can we ensure matching between the policy and application defined roles? + + Each permission in the permission set should be matched with a media role. The + number of different permissions should be kept to a minimum. + +- Should applications start stream corked? + + It must be done on both the application side and the audio manager side. + Applications cannot be trusted. As soon as a stream opens, the PipeWire + process must cork it - before the first sample comes out. Otherwise a + malicious application could play undesirable sounds or noises + while the audio manager is still thinking about what to do with that + stream. The audio manager might be making this decision + asynchronously, by asking for permission from the automotive domain. + The audio manager can choose uncork, leave corked or kill, according to its + policies. + On the application side, it is only possible to *suggest* the best way for + an application to behave in order to obtain the best user experience. + +- Should we use `media.role` or define an apertis specific stream property? + +## Summary of recommendations + +- PipeWire is adopted as audio router and WirePlumber as policy manager. +- Applications keep using the PulseAudio API or higher level APIs like + GStreamer to be compatible with the legacy system. +- The default WirePlumber policy is extended to address the use-cases + described here. +- Static sets of configuration files can implement different policies depending + on hybrid setup or standalone setup. +- Each OEM must derive from those policies to implement their business rules. +- WirePlumber must be modified to check that the application have the + permission to use the requested role and, if the `media.role` is not provided + in the stream, it must check if a default value is provided in the + application bundle metadata. +- If AppArmor support is made available in Flatpak, WirePlumber must be + modified to check for AppArmor identity of client applications. +- The application bundle metadata contains a default audio role for all streams + within an application. +- The application bundle metadata must contain a permission request for each + audio role in use in an application. +- For each stream, an application can choose an audio role and communicate it + to PipeWire at stream creation. +- The policy manager monitors creation and state changes of streams. +- Depending on business rules, the policy manager can request an application to + cork or mute. +- GStreamer's `pipewiresink` support a `stream.properties` parameter. +- A tool for corking a stream could be implemented. + +[Apertis Glossary]: https://wiki.apertis.org/Glossary +[inter domain communication]: https://designs.apertis.org/latest/inter-domain-communication.html +[Application bundle metadata]: https://designs.apertis.org/latest/application-bundle-metadata.html +[Multimedia]: https://designs.apertis.org/latest/multimedia.html +[GENIVI audio manager]: http://docs.projects.genivi.org/AudioManager/ +[mute hard-key]: https://wiki.apertis.org/HardKeys diff --git a/content/designs/automated-license-compliance.md b/content/designs/automated-license-compliance.md new file mode 100644 index 0000000000000000000000000000000000000000..855b7e715c39c2befdde83759add4ad339940b5e --- /dev/null +++ b/content/designs/automated-license-compliance.md @@ -0,0 +1,134 @@ +--- +title: Automated License Compliance +short-description: Automated process for OSS compliance in Apertis +authors: + - name: Martyn Welch +--- + +# Automated License Compliance + +A Linux system such as those assembled by Apertis contain components licensed under many different licenses. +These various licenses impose different conditions and it is important to understand to a good degree of fidelity the terms under which each component is provided. +We are proposing to implement an automated process to generate software Bills Of Materials (BOMs) which detail both the components used in Apertis and the licensing that applies to them. +Licensing isn't static, nor is it always as simple as all the components from a given source package deriving the same license. +Packages have been known to change licenses and/or provide various existing or new components under different terms. +Either now or at some point in the future, the licenses of some of the components in Apertis may start to be provided under [terms that Apertis may wish to avoid](https://designs.apertis.org/latest/license-expectations.html). +For example, by default Apertis is careful not to include components to be used in the target system that are licensed under the GPL version 3, the licensing terms wouldn't be acceptable in Apertis' target markets. + +In order to take advantage of new functionality and support being developed in the software community, Apertis needs to incorporate newer versions of existing software packages and replace some with alternatives when better or more suitable components are created. +To ensure that the licensing conditions remain favorable for the use cases targeted by Apertis, it is important to continually validate that the licensing terms under which these components are provided. +These licensing terms should be documented in a way that is accessible to Apertis' users. + +Debian packages by default track licensing on a per source package level. +The suitability of a package is decided at that level before it is included in Debian, which meets the projects [licensing goals](https://www.debian.org/social_contract.html#guidelines). +Apertis will continue to evaluate licensing before the inclusion of source packages in the distribution, but also wishes to take a more nuanced approach, tracking licensing for each file in each of it's binary packages. +By tracking licensing to this degree we can look to exclude components with unsatisfactory licensing from the packages intended for distributed target systems, whilst still packaging them separately so they may be utilized during development. +A good example of this situation is the `gcc` source package and the `libgcc1` binary package produced by it. +Unlike the other artifacts produced by the GCC source package, the libgcc1 binary package is not licensed under the stock GPLv3 license, a [run time exception](https://designs.apertis.org/latest/license-exceptions.html#gcc8) is provided and it is thus fine to ship it on target devices. +The level of tracking we are proposing will detect such situations and will offer a straight forward way to resolve them, maintaining compliance with the licensing requirements. + +To achieve this 2 main steps need to be taken: + +- Record the licensing of the project source code, per file +- Determine the mapping between source code files and the binary/data files in each binary package + +We recommend to integrate these steps into our CI pipelines to provide early detection of any change to the licensing status of each package. Extending our CI pipelines will also enable developers to learn about new issues and to solve them during the merge request development flow. + +## License scanners + +There are various proprietary and open source tools which can help with tracking the licensing terms that apply to the pieces of software from which Apertis is built. +The following tools are examples of those that can help to achieve the first of the steps outlined above: + +- [Dependency-Track](https://dependencytrack.org/): Open source, higher level tool for presenting data from BOMs generated by other software. +- [FOSSA](https://fossa.com/): Proprietary suite for license compliance tracking and management. +- [FOSSID](https://fossid.com/): Proprietary license scanning tool. +- [FOSSology](https://www.fossology.org/): Open source, server based tool, utilizing a number of techniques to extract licensing information from source and binary artifacts. +- [Licensecheck](https://metacpan.org/pod/App::Licensecheck): A simple open source license checker. +- [Licensee](https://github.com/licensee/licensee): Open source tool, limited to scanning for license files. +- [Ninka](http://ninka.turingmachine.org/): Very limited, light weight, open source tool, developed as a research project aimed at identifying licenses in source code. +- [Protex](https://www.integrauae.com/datasheets/Protex_UL.pdf): Part of the Black Duck suite of proprietary tools for managing open source compliance. +- [ScanCode](https://www.nexb.com/): Suite of open source tools, which provide a foundation on which the company developing them provides it's proprietary enterprise solution. +- [WhiteSource](https://www.whitesourcesoftware.com/): Proprietary suite for open source component management. + +Due to the open source nature of the Apertis project, we intend to utilize an open source tool for license compliance rather than a proprietary solution. +Given the traction, community, and Linux Foundation involvement, our suggestion of open source tool for license scanning is FOSSology. + +### FOSSology + +FOSSology is a server based tool which provides a web front-end that is able to scan through source code (and to a degree binaries) provided to it, finding license statements and texts. +To achieve this FOSSology employs a number of different scanning techniques to identify potential licenses, including using matching to known license texts and keywords. +The scanning process errs on the side of caution, generating false positives over missing potential licensing information, as a result it will be necessary to "clear" the licenses that are found, deciding whether the matches are valid or not. +This is likely to be a very time consuming process, though bulk recognition of identical patterns may provide some efficiencies. +Once completed, FOSSology will record the licensing decisions and can apply this information to updated scans of the source. +It is anticipated that, after an initial round of verification, FOSSology will only require additional clearing of license information should the scan detect new sources of potential licensing information in an updated projects source or when new packages are added to Apertis. +It is possible to export and import reports which contain the licensing decisions that have previously been made, if a trusted source of reports can be found then these could also be imported, potentially reducing the work required. + +FOSSology is backed by the Linux Foundation, it appears to have an active user and developer base and a significant history. +As such, it is felt that this tool is likely to be maintained for the foreseeable future and thus a good choice for integration into the Apertis workflow. + +### CI Pipeline integration + +In order to avoid manual tasks the license detection should be integrated into the CI process. +FOSSology provides a [REST API](https://www.fossology.org/get-started/basic-rest-api-calls/) to enable such integration. + +FOSSology is able to consume branches of git repositories, thus allowing scanning of the given source code straight from GitLab. +It is suggested that this should be triggered after normal build checks have been successfully performed. +A report will be generated and retrieved, using the REST API, which describes (among other things) the licensing status of each file. +The report can be generated in a number of formats, including various SPDX flavors that are easily machine parsable, which will be the preferred option. +It is suggested that each component should require a determination of the licensing to have been made for every file in the project. +Due to the large volume of licensing matches that will result from the initial licensing scan, we recommend that the absence of license information initially generates a warning. +In some cases, to achieve the fine grained licensing information desired, the licensing of some files may need to be clarified with the components author(s). +Once an initial pass of all Apertis components had been made we would expect missing license information to result in an error, as such errors would be as a result of new matches being found, which would need to be resolved in FOSSology before CI would complete without an error. +The generated report should be saved in the Debian metadata archive so that it is available for the following processing. + +## Binary to source file mapping + +Now that we have a way to determine the licensing of the source files, we need a way to determine which of these source files were used in each binary. +Compilers store information in the binaries it outputs, that can be used by a debugger to pause execution of a process at a point corresponding to a selected line of source code. +This information provides a mapping between the lines of source code and the compiled machine code operations. +Executable binaries in Linux are generally stored in the [Executable and Linkable Format](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format) (ELF), the associated [DWARF](https://en.wikipedia.org/wiki/DWARF) debugging data format is generally used to store this debugging information inside the ELF in specific "debug" sections. + +By parsing this information, the source files that were used to generate each binary can be determined. +Combining this with the licensing information provided in the licensing report, a mapping can be made between each binary and it's associated licenses. + +### CI Pipeline integration + +Apertis uses the Open Build Service (OBS) platform to build the binary packages in a controlled manner across several architectures and releases. +OBS utilizes `dpkg-buildpackage` behind the scenes to build each package. +This utility will have access to the source licensing report as it is contained in the Debian metadata archive. +As well as the source licensing, the Debian metadata archive contains configuration to help `dpkg-buildpackage` determine how to build the source. +This is typically done with the help of [`debhelper`](https://manpages.debian.org/jessie/debhelper/debhelper.7.en.html), which provides helpers that simplify this process. +We plan to extend `debhelper` to include a command to perform the mapping between the binary files produced by the build and the license of the associated source files, using the process laid out above, and recording this for each of the binary packages to be made. +In addition, this helper should record the licensing attached to any other files that will be packaged as well. +Typically the binaries are striped (using a debhelper command called `dh_strip`) prior to packaging, removing the debug symbols from the binary and reducing its size. +We suggest that it would be easier to perform the license mapping prior to this step. +Whilst the debug symbols are kept, packaged separately in the `dbgsym` package, it's easier to perform the mapping before this is done. +A report should be saved in each binary package covering the files shipped in that package. +The report should be saved in `/usr/share/doc/<package>/`in a machine parsable SPDX format. + +The new debhelper command will need to be added to the build rules for each package. +Whilst most packages make use of debhelper, many do so via higher level helpers that factor out common functionality, such as `dh` and `CDBS` and this will add complexity to this task. +There may be packages in Apertis that do not make use of debhelper, these packages will need special handling to ensure that the required steps are completed. + +As these reports are provided by each binary package, the reports from installed packages can be accessed at image build time and amalgamated into an image wide report at that point should it be required. +As a binary can be built from multiple sources, each with differing licenses, it will be necessary for the report to detail each file that is used to create each binary and the licensing under which it is provided. +In some circumstances dual licensed source code may allow for a binary to be effectively licensed under the terms of a single license, that is the user has the option to pick a license that results in the whole binary being able to be provided under the terms of a single license. +Where dual licensed source code isn't used, the terms of all applicable licenses should be declared. +The terms of the various licenses may be considered [compatible](https://en.wikipedia.org/wiki/License_compatibility), allowing the binary to effectively be managed under the terms of the more restrictive license. +For example, a binary derived from source code licensed with the GPLv2 license and other source code licensed with the MIT license, the terms of both apply to the binary, though as the terms of the MIT license will be met if the binary is used in accordance with the terms of the GPLv2, then handling the binary as though it was licensed under the GPLv2 will ensure the terms of both are met. +Not all possible combinations of licenses work out this way and thus why it is important to ensure that licensing is properly tracked. + +## Binary Licensing Reporting + +The approach each project using Apertis takes with regards to the reporting of licensing information should be driven by how this information is to be utilized, i.e. some projects may wish to parse the license information and present it in a single BOM file in HTML, XML or human readable text. + +For the images provided by the Apertis project, we plan to combine the reports saved in `/usr/share/doc/<package>/` into a single parsable file. +Should it be required to provide some tool with which to interrogate the licensing which applies to the binary packages, the SPDX files can be imported into FOSSology. + +### CI Pipeline integration + +Apertis utilizes [Debos](https://github.com/go-debos/debos) in its image generation pipeline. +There is an existing tool available for the merging of SPDX documents. +The generation of a combined BOM can be realized by utilizing this tool in a script to be run at the appropriate time during the image build process by integrating the script into the Debos recipes. +Integrating scripts into the Debos recipes is an approach we have taken when generating the list of installed packages and list of files. +It reduces the overhead and potential complexity of decompressing and mounting the images that would be necessary should the BOM be generated in a separate step. diff --git a/content/designs/canterbury-legacy-application-framework.md b/content/designs/canterbury-legacy-application-framework.md new file mode 100644 index 0000000000000000000000000000000000000000..6e822ff22fcb6433a11afef2d6489eea4bbf32c6 --- /dev/null +++ b/content/designs/canterbury-legacy-application-framework.md @@ -0,0 +1,427 @@ +--- +title: Canterbury legacy application framework +short-description: The obsoleted application framework based on Canterbury and Ribchester +authors: + - name: Emanuele Aina + - name: Corentin Noël +--- + +# The Canterbury legacy application framework + +Apertis currently ships with a custom application framework based on the +Canterbury app manager which is in the process of being phased out in favor of +upstream components like Flatpak, see the [](application-framework.md) +document for more details. + +Flatpak and Canterbury cover the core tasks of an application framework: + * packaging + * distribution + * sandboxing + +When Canterbury was designed Flatpak didn't exist and the available +technologies were quite different from what is in today's usage, so it's now +time to reconsider our approach. + +## Flatpak + +* upstream, large community +* mature, proven on the field +* uses Linux containers to isolate the filesystem view from the application +* sandbox based on Linux containers and seccomp +* uses AppStream and .desktop files to encode metadata about the application +* backed by OSTree +* shared runtimes decouple libraries on the host from libraries depended by + applications, changes on the host won't break applications +* deduplicates files across applications, runtimes and the host OSTree-based + system +* SDK runtimes decouple development from the host +* growing IDE support (GNOME Builder, Eclipse) +* standardized D-Bus based portals for privileged operations +* transparent support for portals already available in the most widespread + toolkits (Qt/GTK/etc.) +* large userbase +* available out-of-the-box on the most widespread distributions + (Debian/Ubuntu/Fedora/Red Hat/Suse/etc.) +* well documented +* additional permissions are managed through high level entries in the + application manifest +* sandboxed with seccomp +* mature OTA mechanism for applications +* user-facing app store available upstream +* the upstream app-store, FlatHub, can be deployed for Apertis, or the + experimental Magento app-store could be adapted +* enables third-party applications (Sublime Text, Visual Studio Code, etc.) to + be run on the SDK with no effort + +## Canterbury + +* Apertis specific, no community +* not proven on the field +* pre-dates Linux containers availability, does not use them +* sandbox based on AppArmor +* uses AppStream and .desktop files to encode metadata about the application +* backed by OSTree +* applications use libraries from the host, no decoupling +* no concept of runtimes +* no deduplicaions +* limited IDE support (Eclipse) +* very sparsely documented +* security constraints expressed via low-level AppArmor profiles, no + higher-level permission system +* no seccomp sandbox +* OTA mechanism for applications and agents at the prototype stage (Bosch-only, + not available in Apertis) +* user-facing app store at the prototype stage (Bosch-only, not available in + Apertis) +* there's an experimental Magento-based app-store, not currently available in Apertis + +# Comparison + +Since Apertis is meant to adopt upstream solutions whenever possible it is +natural for us to adopt Flatpak, but to do so the gaps that need to be filled +must be evaluated. + +The two systems are very different and for this reason no transparent +compatibility can be provided, but thanks to the modular approach in Apertis +Canterbury can be kept available in the repositories even if the reference +setup will use Flatpak. + +Since the two systems share many underlying technologies (D-Bus, OSTree, etc.) +their performance are comparable. The additional use of control groups in +Flatpak doesn't add any noticeable overhead. Flatpak consists of just an +executable setting up the environment and does not require an always-running +daemon as Canterbury does, so there may be a negligible memory saving. + +## Applications concept + +The legacy Apertis application framework already defined the concept of +application bundles. The new application framework defines the wanted format +used for the bundle as being Flatpak. + +## Application layout + +The application layout remains compatible with the legacy application framework, +note that the layout is relative to the `/app/` folder inside of the Flatpak. + +## Application entry points + +As the entry points were defined using the standard specification from +FreeDesktop.org, they remain compatible with the new Apertis application +framework and are exposed by the flatpak executable to the system when +necessary. + +Desktop file should be updated to use Flatpak instead of Canterbury to launch +the application, e.g. replacing +--- +Exec=@bindir@/eye app-name @app_id@ play-mode stop url NULL +--- +by +--- +Exec=flatpak run app-name @app_id@ play-mode stop url NULL +--- + +## Application metadata + +The application metadata were specified using the AppStream FreeDesktop.org +specification and remains the main metadata specification for Flatpak. + +## Bundle spec + +The latest Canterbury application bundle specification has been largely based +on the Flatpak one, in a initial effort to align Canterbury with recent +upstream technologies: + +* the binary format is the exactly same; +* in both cases AppStream is used for the bundle metadata; +* entrypoints are defined with `.desktop` files both in Canterbury and Flatpak; +* installation paths differ since Canterbury requires an unique installation + path while Flatpak relies on containers to put different contents on the same + path for each application, but from a practical point of view the difference + is purely cosmetic. + +## Permissions + +No high level support for application permission has been implemented in Canterbury, +application access to resources was exclusively based on writing dedicated AppArmor +profiles for each applications can carefully reviewing them. + +Flatpak instead lets application authors specify in the application manifest a +set of special high-level permissions. The Flatpak approach has been analysed +in more detail in the original [](permissions.md) document which already +described the use-cases for the permissions mechanism in the context of the +Apertis application framework. + +## Preferences and persistence + +The Apertis application framework satisfies the requirements of the legacy +application framework. The only missing part is that application rollback is +not able to revert the user-data to a previous state. + +## Containerisation + +Canterbury pre-dates the maturity of containerization in Linux (cgroups and +namespaces) and it does not make use of it. + +Flatpak is instead heavily based on containers, providing much stronger +isolation capabilities. + +## Large data sharing + +The Apertis application framework allows to share data using the standard +mechanisms as described by the FreeDesktop.org Desktop File specification. +Any D-Bus enabled sharing service can be used when specifying the right +interface in the Flatpak manifest. It is no more possible to register a service +by putting a file into `/var/lib/apertis_extensions/applications` at +installation time as the files are installed into a different path for each +bundle. + +## Dialogs and notifications + +The Apertis application framework is also using the [Notification Specification](https://people.gnome.org/~mccann/docs/notification-spec/notification-spec-latest.html) and allows to reuse the same interface without any breakage. + +The dialog abstraction for the legacy application framework has never been +implemented as its design is subject to many questions. + +## Launch applications and services + +As Flatpak is well-integrated into existing environments and uses the same +technology and protocols for its foundations, there is no expected problems with +Flatpak here. + +## Launch pre-configured default apps at start-up (Launcher / Global popup / Status Bar) + +The work has already beeing started as show by this +[upstream request](https://github.com/flatpak/flatpak/issues/118) +for this feature making it a small gap to fill. + +## AppArmor + +Currently Apertis depends heavily on AppArmor to constrain services and +applications: it is used to restrict filesystem access and mutually +authenticate applications in a secure way when communicating over D-Bus. + +AppArmor is currently used in Apertis for two different purposes: +* access constraints +* secure identification of D-Bus peers + +While Flatpak has no support for AppArmor out of the box and adding it is not +on the roadmap so far, the first use case is already covered by the use of +Linux cgroups and namespaces which provide more flexibility than AppArmor. +Flatpak also ships a D-Bus proxy to manage access policies at the D-Bus level, +since that needs a finer control than cgroups and namespaces can provide. + +The higher-level access constraints implemented by Flatpak are much easier and +secure to be used by application authors than the low-level AppArmor policy +language currently used by Apertis. In that sense, the adoption of Flatpak +would be aligned to the plan to provide an higher-level access constraints +mechanism to application authors and shield them from the AppArmor policy +language. + +Flatpak also includes the concept of "portals" to provide restricted access to +resources to unprivileged applications, either by applying system-specific +policies or by requiring user interaction. For instance, applications don't +have access to user files, and file opening is handled via a privileged portal +that ensure that applications can only access files users have given their +consent to. + +The second use of AppArmor is something very few applications at the moment +use, and portals seem well suited to replace its known usages: +* Canterbury itself uses it to control applications: this is managed by Flatpak + by using cgroups +* Newport (download manager) uses it to securely identify its clients: creating + a dedicated Flatpak portal would address the use-case with no reliance on + AppArmor +* Frome (magento app-store client) uses it to only let the + `org.apertis.Mildenhall.Setting` system application talk to it: a dedicated + Flatpak portal seem appropriate here as well +* Beckfoot (network management service) uses it to talk with + `org.apertis.Mildenhall.StatusBar`, but Beckfoot itself has been declared + obsolete long ago in {T3626} and the existing + [org.freedesktop.portal.Notification](https://flatpak.github.io/xdg-desktop-portal/portal-docs.html#gdbus-org.freedesktop.portal.Notification) + could be used instead. + +## Headless agents + +Flatpak focuses on graphical application on the user session bus: nothing in +its design prevents its usage for headless agents and some testing didn't show +any significant issue, but some rough edges are expected. + +Some one-time effort may be needed to consolidate this use-case in Flatpak. + +## System agents + +Canterbury can only manage user-level applications and agents, and it doesn't +currently have support for agents meant to be accessed on the system bus by +different users. + +Flatpak is not suited for system agents as well and focuses on the user +session. Upstream explicitly considers system agents a non-usecase and working +in this direction would produce a significant delta that would significantly +impact the maintenance burden. + +Flatpak apps run in an environment that can never exercise capabilities +(`CAP_SYS_ADMIN`, `CAP_NET_ADMIN` etc.) or transition between uids, so some +system services will not be possible to implement. System services that could +run as an unprivileged system-level uid and don't do anything inherently +privileged, like downloading files and putting them in a centralized location +where all users can access them, should work. System services that need to be +root to do inherently privileged things, like ConnMan/BlueZ, won't. + +systemd "portable services", perhaps deployed using OSTree, might be a +reasonable solution for system agents. They are very new and not yet considered +stable, but are specifically meant for this purpose. + +## Multiple entry points + +Canterbury supports multiple entry points in a single app-bundle, and Flatpak +should support more than one desktop file which, as in Canterbury, are the +implementation of entry points. + +## Application manager D-Bus interface + +Canterbury exports an obsoleted D-Bus interface with a set of largely unrelated +methods to: +* let application register themselves +* communicate to applications their new application state (show, hide, paused, + off) +* hide global popups +* get the currently active application +* get the application that is currently using the audio source +* find out if the currently active application needs an Internet connection + +Tracking the application that is currently "active" and hiding popups are tasks +that should be handled by the compositor. The other interfaces are considered +problematic as well. + +Canterbury-core, the version of Canterbury for headless systems, already +doesn't ship the application manager interface so there's no contingent need to +reimplement it. + +## Audio management + +The legacy application framework was built around PulseAudio. + +Canterbury provides a custom audio manager which was already considered +obsoleted and a [different design](https://designs.apertis.org/latest/audio-management.html) +was proposed some time ago on top of PulseAudio. + +With the need of more containment into the framework, the Apertis application +framework is meant to use PipeWire as a replacement for PulseAudio. The intent +for PipeWire is to be a drop-in replacement for PulseAudio during the +transition period. PipeWire also provides a sink and source GStreamer element +to replace their PulseAudio conterparts. + +PipeWire is designed to let an external policy engine dictate how the audio +should be routed and also provide proper security controls to restrict +untrusted applications: for this reason AGL plans to use it as the foundation +for their upcoming audio management solution, and Collabora is involved to +ensure the embedded use-cases are covered. + +An alternative which is largely in use is the GENIVI AudioManager, which can be +used with Flatpak as well. + +Canterbury-core, the version of Canterbury for headless systems, already +doesn't ship the audio manager so there's no contingent need to reimplement it. + +## Hard Keys + +Canterbury provides a D-Bus interface for handling hard-keys by communicating +with the compositor over private interfaces. This is considered obsolete and +hard-key handling should happen in the compositor directly. + +Canterbury-core, the version of Canterbury for headless systems, already +doesn't ship the hard key interface so there's no contingent need to +reimplement it. + +## Preference application launching + +Canterbury provides a D-Bus interface to let applications launch the preference +manager to edit their preferences rather than providing their own interface. +This also requires support in the preference manager, which is not currently +implemented. + +Canterbury-core, the version of Canterbury for headless systems, already +doesn't ship the preference launcher interface so there's no contingent need to +reimplement it. + +## Out-of-memory handling + +When memory pressure is detected Canterbury tries to kill applications not +currently visible. The private API between Canterbury and the Mildenhall +compositor and the implementation were already known to be problematic and were +considered to be needing a significant rework in any case, possibly to move +them to a dedicated module. + +The module dedicated to the prioritization of applications in case of memory +pressure can then be implemented to work with Flatpak applications seamlessy. + +## Bandwidth prioritization + +Canterbury provides a experimental bandwidth prioritization system that is +known to be problematic and has been considered obsoleted, see {T4043} for +details. No similar mechanism is available in Flatpak. + +## App store + +There's an experimental Magento-based app-store for Canterbury, but it is not +yet available in Apertis. Flatpak has its own upstream app store, FlatHub, +which is Open Source and can be self-hosted. It doesn't currently implement +payments in any form. Possible options here are either publishing the +Magento-based code and adapting it to work with Flatpak with a limited amount +of changes but higher maintenance costs, or contribute on the implementation of +payment methods on FlatHub, with an higher one-time cost but likely lower +on-going maintenance requirements. + +## Manage launched application windows using the Window Manager + +This was deprecated since Apertis 17.09. Canterbury uses private interfaces +with the compositor to: +* show/hide splashscreens, but WM should be able to display splashscreens on + its own without involving the application manager +* learn which application is being displayed to manage the "back" stack, but + the WM is better positioned to handle the "back" stack on its own +* inform the WM that the Last User Mode is being set up, but it appears that + the compositor takes no special action in that case + +## Notifies application whether they are in background or foreground + +This is not part of canterbury-core and has been deprecated since Apertis +17.09. In a single fullscreen window scenario this can be handled by tracking +whether the application has the focus or not. In the case multiple applications +are visible at the same time, such as in the normal desktop case, the +"background" status can be misleading since applications can still be partially +visible. Wayland provides the frame clock to throttle the rendering of +application windows which are not visible. + +## Maintain an application stack + +Canterbury maintains a stack of applications +to provide an Android-like back button. This feature should be implemented by +the compositor to avoid layering violation. This is not part of canterbury-core +as well and deprecated since Apertis 17.09. + +## Store Last User Mode (LUM) information periodically and restore LUM on start-up + +This is not part of canterbury-core, and was deprecated since 17.09. +Canterbury saves the currently running applications, "back" stack and the +selected audio output in order to restore them on reboot. + +The compositor should handle the saving and restoration of the application +stack and the audio manager should save and restore the selected audio output +without involving the application manager. + +# Conclusions + +* No major gaps have been identified between Canterbury and Flatpak +* Flatpak has an very active upstream community and widespread adoption +* Most of the Canterbury APIs not related to app-management have been formally + deprecated since Apertis 17.09 +* Providing compatibility between the two would be a very big undertaking with + unclear benefits, so it's actively discouraged and existing applications + needs to be ported explicitly +* HMI applications will need to be reimplemented in any case as Mildenhall is + not a viable solution for product teams +* The Canterbury application framework will remain available in Apertis as + an option at least until the new application framework has matured enough and + reference applications are available for it, and product teams will be able + to choose one or the other depending on their specific needs diff --git a/content/designs/case-for-moving-to-debian.md b/content/designs/case-for-moving-to-debian.md new file mode 100644 index 0000000000000000000000000000000000000000..cefe7aac9abefdceca64b52688a1b4f19cb21936 --- /dev/null +++ b/content/designs/case-for-moving-to-debian.md @@ -0,0 +1,86 @@ +--- +title: The case for moving to Debian stretch or Ubuntu 18.04 +authors: + - name: Andrej Shadura + - name: Sjoerd Simons +--- + +# Why was Apertis based on the Debian/Ubuntu ecosystem + +At the beginning of Apertis, a few platforms were considered for the base of Apertis: MeeGo, Tizen, OpenEmbedded Core, Debian and Ubuntu. A choice of Debian/Ubuntu ecosystem was based on Debian being ‘one of the oldest and largest (most inclusive of OSS packages), and one of the first Linux distributions to feature an ARM port’, providing ‘a very solid distribution baseline’ and ‘a high degree of robustness against the involvement or not of individual contributing companies’, while Ubuntu bases on Debian but adds value important for Apertis (see below). Another point against the other alternatives (e.g. OpenEmbedded Core) was that Collabora and Bosch have already invested into Open Build System infrastructure, while Yocto/OpenEmbedded has its own build infrastructure and tools not compatible with OBS. + +Another important point was that Collabora employed and continues to employ many Debian package maintainers, who contribute to key OSS middleware packages within both the Debian and Ubuntu projects directly, which presented a serious benefit over other alternatives. + +# Why was Ubuntu taken as the direct upstream rather than Debian + +When the decision to use Ubuntu was taken, Ubuntu had several benefits over Debian. Especially taking into account the initial goal or having an update cycle of around 6 month of the baseline platform. + +Debian only releases once every 2 to 2.5 years, while Ubuntu does release every 6 months with every 4th of those being a long-term support release. This means that the only way of doing a refresh every 6 months based directly on Debian would mean creating a snapshots of Debian testing, stabilising that and providing security support for it. Doing that purely for Apertis would of course require a significant amount of resources, but more importantly, it is essentially what Ubuntu is already doing. This made Ubuntu more suitable as a baseline for a 6 month update cycle. + +Furthermore, the Linaro initiative used Ubuntu as a reference distribution for all of their validation of hardware enablement. Linaro and Canonical engineers actively integrated the latest work from Linaro and SoC vendors, including Freescale, into Ubuntu. By using Ubuntu as a base Apertis could benefit from and build on this work. + +At the time, Ubuntu was the *de facto* upstream of AppArmor, this included patched kernels to enable latest features (D-Bus mediation, socket mediation, ptrace mediation, etc.) as well as changes to individual packages to improve their apparmor profiles. + +# What has changed +While doing two base platform refreshes every year has been successful, the users of Apertis weren't actually set up to follow in such a fast cycle. On top of that the non-LTS releases of Ubuntu limited their security support cycle from 18 months after release to only 9 months after release. In other words the upgrade window since the start of the Apertis project went from around one year after a platform refresh to only *3 months* after each platform refresh before the upstream security support end. Such a short time-frame is not achievable with the required updates and validation that are required before a major product rollout. + +Due to the policy changes, it was decided to base the Apertis platform on LTS versions rather than refreshing on each version, utilising the longer security support period on these LTS releases. Apertis was last rebased onto the Ubuntu 16.04 LTS release (codenamed "Xenial Xerus"). + +Ubuntu and Linaro are no longer collaborating together as they were. Linaro are now supporting various boards using a Debian based release, directly contributing to Debian and no longer supporting Ubuntu. + +The infrastructure required by Apparmor has matured to the point where the features used by Apertis have been upstreamed and as such Apertis is no longer tied to Ubuntu in this regard. + +## Debian Stretch +* Benefits + - Debian is a community project, with no single company driving its development + - Maintenance of components we rely on is not tied to Canonical’s commercial strategies + - Security support for at least 5 years since the initial stretch release via the Debian LTS project + - More direct contribution path for package changes done for Apertis since they can go directly into the main upstream distribution + - Debian stable and security updates tend to be more conservative and stable making it easier to track over time + - Debian provides a backports repository for packages where a version newer than that in the stable release might be of interest +* Risks + - Debian does not use a strict 2-years release cycle. Thus the Apertis platform update cycle also cannot be strictly time-based when using Debian + +## Ubuntu 18.04 +* Benefits + - Ubuntu has a strict time-based release cycle of a new LTS every two years + - Ubuntu also has a 6-months regular release cycle (with very limited support) should the decision to use LTS version be revised +* Risks + - Ubuntu is bound to the health, technical and commercial strategy of Canonical. Canonical has shifted its focus several times in recent years which has resulted in numerous changes not aligned to the goals of Apertis. Canonical has also introduced their own technologies rather than utilising ‘upstream’ technologies a number of times, for example Mir vs. Wayland and Snappy vs. Flatpak. Some of these choices have had an impact when utilising Ubuntu packages in Apertis, requiring extra work to be performed (e.g. disabling Mir). + - Ubuntu’s stable releases can have more aggressive updates to certain packages, which can destabilise things for Apertis as well as requiring extra support effort + - Ubuntu’s main support is around a subset of Debian packages available in the Ubuntu’s *main* repository. A more complete set of packages can be found in Ubuntu’s *Universe* respositories, however these tend to get less attention, and basically only provide as much support as Debian provides + - On-going support of Ubuntu depends on the commercial success of Canonical + +# Impact of move + +## Will the rebase process take longer if we move to Debian instead of the next Ubuntu LTS release? + +When Apertis tracked non-LTS Ubuntu releases, rebases were performed every six months following each release. Every rebase took about two months to complete. As a part of a rebase procedure, the following tasks needed to be completed: + +* Fork Apertis in preparation of a new release +* Set up Merge-our-Misc to track the latest Ubuntu release +* Repeat until there are no build failures: + - Accept automatic merges produced Merge-our-Misc + - If there are no automatic merges, process pending manual merges + - If new packages break the builds, fix them + +Since Apertis no longer pulls changes from regular Ubuntu releases, it is quite behind the future release which is set to be an LTS. The delta between Apertis and the current Ubuntu is about the same size as between Apertis and Debian, and will take similar time to process. Regardless of the decision to stay with Ubuntu or move to Debian, the following work will need to be done: + +* Switch to the latest versions of GCC and rebuild all packages with them +* Rebase all packages to their newer versions from either Ubuntu 18.03 or Debian stretch, for each component +* Review Apertis changes to the packages updated upstream, potentially dropping them if they are no longer relevant +* Switch to the latest Java version for the SDK, dropping Apertis patches fixing build failures with the older Java version Apertis shipped + +According to our estimation, the difference in the amount of time needed to perform that work is going to be negligible. + +## Ubuntu does validation, this would be missing if we move to Debian? + +We understand that Ubuntu does some hardware validation testing of standard Ubuntu configurations (which we do not use) against hardware from their partners who pay for it (https://certification.ubuntu.com/). The vast majority of the functionality that such tests will focus on are related to kernel functionality. Since we do not for the most part use the Ubuntu kernel and target different hardware, these tests do not seem relevant in our use case and thus we do not lose anything by moving to Debian. + +## Do we lose anything by moving to Debian? + +We believe that we do not lose anything other than the strict time-based release cycle by moving from Ubuntu to Debian. However, we feel that this is now less important given we are now syncing on just Ubuntu LTS releases (every 2 years), and with Debian release cycle tending to a 2 year release cycle this is not believed to be problematic. + +# Recommendations + +Collabora recommends rebasing on Debian Stretch for Apertis 18.06 and onwards. Most of the benefits of basing on Ubuntu have gone away since the original decision was taken in late 2011, while the projects dynamics have also changed to better suit a Debian based distribution. Basing on Debian rather than Ubuntu, would move Apertis closer to it's ultimate upstream (as Ubuntu is also a downstream of Debian) cutting out a middle-man, which currently brings very little to the table as described above. This also may make the process of upstreaming appropriate package changes more efficient, reducing the maintenance overhead in Apertis. diff --git a/content/designs/closing-ci-loop.md b/content/designs/closing-ci-loop.md new file mode 100644 index 0000000000000000000000000000000000000000..f0b08f960de481b010d82586a62c2483cfa80070 --- /dev/null +++ b/content/designs/closing-ci-loop.md @@ -0,0 +1,590 @@ +--- +title: Closing the Automated Continuous Integration Loop +short-description: Close the automated CI loop using the existing infrastructure. +authors: + - name: Luis Araujo +--- + +# Background + +The last phase in the current CI workflow is running the LAVA automated tests, and +there is no mechanism in place to properly process and report these tests results +which basically leave the CI loop incomplete. + +# Current Issues + +The biggest issues are: + + - Tests results need to be checked manually from LAVA logs and dashboard. + + - Bugs need to be reported manually for tests issues. + + - Weekly test reports need to be created manually. + This point might just be partially addressed by this concept document since a + proper data storage for test cases and tests results has not been defined. + + - No mechanism in place to conveniently notify tests issues. Critical issues can + be very easily overlooked. + +# Proposal + +This document proposes a design around the available infrastructure to implement +a solution to close the CI loop. + +The document only covers automated tests, and it leaves manual tests for a later +proposal with a more complete solution. + +# Benefits of closing the CI loop + +Closing the loop will allow to save time and resources by automating the manual +tasks of checking automated tests results and reporting their issues. It will also +provide the infrastructure foundation for further improvements in tracking the +overall project health. + +From a design perspective, it will also help to keep a more complete workflow +in place for the whole infrastructure. + +Some of the most important benefits: + + - Checking automated tests results will need minimal or no manual intervention. + - Automated tests failures will be reported automatically on time. + - It will provide a more consistent and accurate way to track issues found + by automated tests. + - It will help to keep tests reports up to date. + - It will provide the infrastructure components that will help to implement + further improvements in tracking the overall project health. + +Though the project as a whole will benefit from the above points, some benefits +will be more relevant depending on the project roles and areas. Following +subsections give a list of these benefits for each role. + +## Developers and Testers Benefits + + - It will save time for developers and testers since they won't need to check + automated tests logs and tests results manually in order to report issues. + - Developers will be able to notice and work on critical issues much faster, + since failures will have more visibility on time. + +## Managers Benefits + + - Given automated tests issues will be reported on time and more consistently, + that will help managers to take more accurate decisions during planning. + +## Products Teams Benefits + + - The whole CI workflow for automated tests is properly implemented, so it + offers a more complete solution to other teams and projects. + - Closing the CI loop offers a more coherent infrastructure design that other + products teams can adapt to their own needs. + - Products teams will have a better view of the bugs opened in a given time, thus + having a better idea about the overall project health. + +# Overview of steps to close the CI loop + +This is an overview of the phases required to complete closing the current CI loop: + + - Tests results should be fetched from LAVA. + - Tests results should be processed and optionally saved somewhere. + - Tests results should be analyzed. + - Tasks for tests issues should be created using analyzed tests results. + +# Current Infrastructure + +This section explores the different services available from our infrastructure +proposed to implement the remaining phases to close the CI loop. + +## LAVA User Notifications + +As of LAVA V2, there is a new feature called **Notication Callback** which allows +to send a GET or POST request to a specified URL to trigger some action remotely. +If using the POST request, this allows to attach and send test job information and +results. + +This can be used to send the tests results back to Jenkins from LAVA for further +processing in new pipeline phases. + +## Jenkins Webhook Plugin + +This plugin provides an easy way to block a build pipeline in Jenkins until an +external system posts to a webhook. + +This can be used to wait for the automated tests results sent by LAVA from a new +Jenkins job responsible of triggering the automated tests. + +## Phabricator API + +Conduit is the developer API for Phabricator which can be used to implement the +management of tasks. + +This API can be used (either with tools or language bindings) to manage Phabricator +tasks from a Jenkins phase in the main pipeline. + +## Mattermost + +Mattermost is the chat system used by the Apertis project. + +Jenkins already offers a plugin to send messages to mattermost. + +This can be used in order to send notifications messages to the chat channels, +for example, to notify the team once a critical test starts failing, or when a bug +has been updated. + +# CI Workflow Overview + +The main workflow would basically consist on combining the above mentioned +technologies to implement the different phases for the main CI pipeline. + +A general overview of the steps involved would be: + + - Jenkins build images and trigger LAVA jobs. + - Use the `webHook` pipeline plugin to wait for LAVA tests results from Jenkins. + - LAVA execute automated tests jobs and results are saved in its database. + - LAVA triggers an user notification callback attaching test job information + and results to send to the Jenkins webHook. + - Tests results are received by Jenkins through the webHook. + - Test information is sent to a new `pipeline` to process and analize tests + results. + - Once tests results are processed and analyzed, these are sent to a new + `pipeline` to manage Phabricator tasks. + - Optionally a new Jenkins phase could notify results to mattermost or via email. + +This complete loop will be executed every time new images are built. + +# Fetching Tests Results + +The initial and most important phase to close the loop is fetching and processing +the automated tests results from LAVA. + +The proposed solution in this document is to use the webHook plugin to fetch the +LAVA tests results from Jenkins once the automated test job is finished. + +Currently, LAVA tests are submitted in the last stage of the Jenkins pipeline job +creating and publishing the images. + +Automated tests are organized in groups, which are submitted all at once using the +`lqa` tool for each image type once the images are published. + +A webhook should be registered for each `test job` rather than for a group of +tests, so a change in the way LAVA jobs are submitted is required. + +## Jenkins and LAVA Interaction + +The proposed solution is to separate the LAVA job submission stage from the main +Jenkins pipeline job building images, and instead have a single Jenkins job that +will take care of triggering the automated tests in LAVA once the images are +published. + +The only required fields for the stage submitting the LAVA tests jobs are the +`image_name` , `profile_name`, and `version` of the image. A single Jenkins job +could receive these values as arguments and trigger the automated tests for each +of the respective images. + +The way LAVA jobs are submitted from Jenkins will also require some changes. The +`lqa` tool currently submit several `groups` of tests jobs at once, but since each +test job requires to have an unique webhook, they will need to be submitted +independently. + +One simple solution is to have `lqa` processing the job templates first and then +submit each processed job file with an unique webhook. + +Once all tests jobs are submitted for a specific image type, the Jenkins executor +will wait for all of their webhooks. This will block the executor, but since the +webhook returns immeditealy for those jobs that already posted the results to the +webhook callback, it is fair to say that the executor will only block until the +last completed test job sends its results back to Jenkins. + +After all results are received in Jenkins, these can be processed by the remaining +stages required for tasks management. + +## Jenkins Jobs + +Since images are built from a single Jenkins job, the most sensible approach for +final implementation is to have a new Jenkins job receiving all the images types +information and triggering tests for all of them, then a different job for +processing tests results, and possible another one handling the tasks management +phases. + +# Tasks Management + +One of the most important phases in closing the loop is reporting tests issues in +Phabricator. + +Tests issues will be reported automatically in Phabricator as tasks per test cases +instead of tasks per issues. This has an important consequence explained in the +[](#considerations) section. + +This section gives an overview for the behaviour of this phase. + +## Workflow Overview + +Management of Phabricator tasks can be as follow: + + 1) Query Phabricator to find all open tasks with the tag `test-failure`. + + 2) Filter the list of received tasks to make sure only the exact tasks are + processed. For this, scanning for further specific fields in the task can be + helpful, for example, keeping only tasks with a specific task name format. + + 3) Fetch analyzed tests results. + + 4) For each test, based on its results and checking the tasks list, do the + following: + + a) Task exists: Add a coment to the task. + + b) Task does not exist: + - If test has status `failed`: Create a new task. + - If test has status `passed`: Do nothing. + +## Considerations + + - The comment added to the task will contain general information of the failure + with a link to the LAVA job logs. + + - Tasks won't be reported per platform but per test case. Once a task for a test + case failure is reported, all platforms failures for such a test case should be + added as comments to that single task. + + - Closing and verifying tasks will still require manual intervention. This will + help avoiding the following corner cases: + + - Flaky tests that would otherwise end up in a series of new tasks that gets + autoclosed. + - Tests failing on one image that also succeed on a different image. + + - If a test starts failing again for a previously closed task, a new task will be + created automatically for it, and manual verification is required to check if + it is the same previously reported issue, in which case is recommended to add + a reference to the old task. + + - If after fixing an issue for a reported task, a new issue arises for the + same test case, the same old task will be updated with this new issue. This + is an effect of reporting tasks per test cases instead of per issues. In such a + case, manual verification can be used to confirm if it is or not the same issue + and a new subtask can be manually created by the developer if deemed necessary. + +## Phabricator Conventions + +For automation of the phabricator tasks management, there will be the need of +creating certain conventions in phabricator. This will require minimal manual +intervention. + +First of all a specific user should be created in Phabricator to manage these +tasks automatically. + +This username could be named `apertis-qa` or `apertis-tests`, and its only purpose +will be to manage tasks automatically at this stage of the loop. + +A special tag and a specific format in the tasks name will also be used in tasks +reported by this special user: + + - The tag `test-failure` is the special tag for automated tests failure. + - The task name will have the format: "{testcasename} failed: <Task title>". + - A `{testcasename}` tag can also be used if it is available for the test. + +# Design and Implementation + +This section gives a brief overview of the design for the main components to close +the loop. + +Each of these components can be developed as independent modules, or as a single +unit containing all the logic. The details and final design of these components +depend on the most convenient approach chosen during implementation. + +## Design + +### Tests Processor + +This will take care of processing the tests results as they are received from LAVA. + +LAVA tests results are sent to Jenkins in a `raw` format, so things to do +at this level could involve cleaning data or even converting tests results to +a new format so they can be more easily processed by the rest of the tools. + +### Tests Analyzer + +This will make sure that the test results data is in a consistent and convenient +format to be used by the next module (`task manager`). + +This can be a new tool or just be part of the `test result processor` running +in its same Jenkins phase just for convenience. + +### Tasks Manager + +This will receive the whole tests results data analyzed, and ideally it shouldn't +deal with any test data manipulation. + +It will take care of comparing the status between tests results and +phabricator tasks, decide next steps to do and manage those tasks accordingly. + +### Notifier + +This can be considered an `optional` component and can involve sending further +forms of notifications to different services, for example, send messages to +`Mattermost` channels or emails notifying about new critical bugs. + +## Implementation + +As originally envisioned, each of the design components could be written using a +scripting language, preferably one that already offers a good integration with +our infrastructure. + +The `Python` language is highly recommended, as it already offers plugin for +all the related infrastructure, so it would require a minimal effort to integrate +a solution written in this language. + +As a suggested environment, Jenkins could be used like the main place to +execute and orchestrate each of the components. They could be executed using a +different pipeline for each phase, or just a single pipeline executing all the +functionality. + +For example, once the LAVA results are fetched in Jenkins, a new pipeline phase +receiving tests results can be started to execute the `test processor` and +`test analyzer`, which in turn will send the output to a new pipeline phase to +execute the `task manager` and later (if available) to the `notifier`. + +## Diagram + +This is a diagram explaining the different infrastructure processes involved in +the proposed design to close the CI loop. + + + +# Security Improvement + +The Jenkins webhook URL will be visible from the public LAVA tests definitions, +which might arise security concerns. For example, another process posting to the +webhook before LAVA does will break the Jenkins job waiting for the tests results. + +After researching several options to solve this issue, one solution has been found +which consists in checking for a protected authorization header in Jenkins sent by +LAVA when posting to the webhook. + +This solution requires changes both in the Jenkins plugin and the LAVA code, and +they need to be implemented as part of the solution for closing the CI loop. + +# Implementation + +The final implementation for the solution proposed in this document will mainly +involve developing tools that need to be executed in Jenkins and will interact +with the rest of the existing infra services: LAVA, Phabricator and optionally +Mattermost. + +All tools and programms will be available from the project git repositories with +their respective documentation, including how to setup and use them. + +In addition to this, the final implementation will also include documentation about +how to integrate, use and maintain this solution using the currently available +infrastructure services, so other teams and projects can also make use of it. + +# Constraints or Limitations + + - Some errors might not be trivially detected for automated tests, since + LAVA can fail in several ways, for example, infra errors sometimes might be + difficult to analyze and will still require manual intervention. + + - The `webHook` plugin blocks the Jenkins pipeline. This might be an issue + in the long term and it should be an open point for further researching + in later version of this document or during implementation. + + - This document deals with the existing infra, so a proper data storage has + not been defined for test cases and tests results. Creation of weekly tests + reports will continue requiring manual intervention. + + - The test definitions for public LAVA jobs are publicly visible. The Jenkins + webhook URL will also be visible in these tests definitions, which can be a + security concern. A solution for this issue is proposed in + [](#security-improvement). + + - Closing and verifying tasks will still require manual intervention due to + the points explained in the [](#considerations) section. + +# New CI Infrastructure and Workflow + +The main changes in the new infrastructure is that test results and test cases +will be stored in SQUAD and Git respectively, and there will be mechanisms in +place to visualise test results and send test issues notifications. The new +infrastructure is defined at the [test data storage document][TestDataStorage]. + +Manual tests are processed by the new infrastructure, so the new workflow will +also cover closing the CI loop for manual tests. + +## Components and Workflow + +A new web service can be setup to receive the callback triggered by LAVA in a +specific URL in order to fetch the automated tests results, instead of using the +Jenkins webhook plugin. This is in case that using the Jenkins webhook might turn +out not to be a suitable solution during implementation either for the current +CI loop infrastructure or for the new one. + +Therefore the following steps will use the term `Tests Processor System` to refer +to the infrastructure in charge of receiving and processing these test results, +and which can be setup either in Jenkins or as a new infrastructure service. + +The main components for the new infrastructure can be broadly split into the +following phases: automated tests, manual tests, tasks management, and reporting +and visualization. + +### Automated Tests + +Workflow for automated tests: + + - Jenkins build images and trigger LAVA jobs. + - LAVA execute automated tests jobs and results are saved in its database. + - LAVA triggers an user notification callback attaching test job information + and results to send to the tests processor system. + - The system opens a HTTP URL to wait for the LAVA callback in order to receive + tests results. + - Tests results are received by the tests processor system. + - Once test results are received, these are processed with the tool to convert + the test data into SQUAD format. + - After the data is in the correct format, it is sent to SQUAD using the HTTP + API. + +### Manual Tests + +Test results will be entered manually by the tester using a new application, +in this workflow named `Test Submitter Application`. + +This application will prompt the tester to enter each manual test results, and +will send the data to the SQUAD backend, as explained in the +[test data storage document][TestDataStorage]. + +The following workflow includes the processing of manual test results into the +CI loop: + + - Tester manually executes test cases. + - Tester enters test results into the test submitter application. + - The application sends the test data to the tests processor system using a + reliable network protocol. + - Tests results are received by the tests processor system infrastructure. + - Once test results are received, these are processed with the tools to convert + the test data into SQUAD format. + - After the data is in the correct format, it is sent to SQUAD using the HTTP + API. + +### Tasks Management + +This phase deals with processing the test results data in order to file and manage +Phabricator tasks and send notifications. + + - Once all tests results are stored in the SQUAD backend, they might still need + to be processed by other phases in the tests processor system, and sent to a + new phase to manage Phabricator tasks. + - The new Phabricator phase uses the tests data to file new tasks following the + logic explained in the [](#tasks-management) section. + - The same phase or a new one could notify results to mattermost or via email. + +### Reporting and Visualization + +A new web application dashboard will be used to view test results and generate +reports and graphical statistics. + +This web application will fetch results from the SQUAD backend and will process +them to generate the relevant statistic and graphics. + +The weekly test report will be generated either periodically or at any time as +needed using this web application dashboard. + +More details can be found in the [reporting and visualization document][TestDataReporting]. + +## General Workflow Overview + +This section gives an overview about the complete workflow in the following +steps: + + - Automated tests and manual tests are executed in different environments. + + - Automated tests are executed in LAVA and results are sent to the `HTTP URL` + service open by the `Tests Processor System` to receive the LAVA callback + sending the tests results. + + - Manual tests are executed by the tester. The tester uses the `Test Submitter + App` to collect tests results and send them to the `Tests Processor System` + using a reliable network protocol for data transfer. + + - All test results are processed and converted to the SQUAD JSON format by the + `Test Processor and Analyzer`. + + - Once test results are in the correct format, they are sent to the SQUAD backend + using the SQUAD HTTP API. + + - Test results might still need to be processed by the `Test Processor and + Analyzer` in order to be sent to the new phases. Once results are processed, + these are passed to the `Task Manager` and `Notification System` phases to + manage Phabricator tasks and send email or mattermost notifications + respectively. + + - From the SQUAD backend, the new `Web Application Dashboard` fetches tests + results periodically or as needed to generate test results views, graphical + statistics, and reports. + +The following diagram shows the visual for the above workflow: + + + +## New Infrastructure Migration Steps + +- Setup a SQUAD instance. This can be done using a Docker image, so the setup + should be very straightforward and convenient to replicate downstream. + +- Extend the current `Test Processor System` to submit results to SQUAD. This + basically consists on using the SQUAD URL API to submit the test data. + +- Convert the testcases from the wiki format to the strictly defined YAML format. + +- Write an application to render the YAML testcases, guide testers through them and + provide them a form to submit their results. This is the `Test Submitter App` and + can be developed as either a web frontend or a command line tool. + +- Write the reporting web application which fetches results from SQUAD and renders + reports. This is the `Web App Dashboard` and it will be developed using existing + modules and frameworks in a convenient way such a deployment and maintenance can + be done in the same way than other infrastructure services. + +## Maintenance Impact + +The new components required for the new infrastructure are the `Test Submitter`, +`Web Application Dashboard` and SQUAD, along with some changes needed for the +`Test Processor System` to receive the manual tests results and send test data +to SQUAD. + +SQUAD is an upstream dashboard that can be deployed using Docker, so it can be +conveniently used by other projects and its maintenance effort won't be more than +other infrastructure services. + +The test submitter and web application dashboard will be developed reusing existing +modules and frameworks for each of their functionalities, they mainly need to use +already well defined APIs to interact with the rest of the services, and they will +be designed in such a way that can be conveniently deployed (for example, using +Docker). They are not expected to be large applications, so maintenance should be +equal to other tools in the project. + +The test processor is a system tool, developed in a modular way, so each component +can reuse existing modules or libraries to implement the required functionality, +for example, make use of an existing HTTP module to access the SQUAD URL API, so +it won't require a big maintenance effort and it will practically be the same than +other infrastructure tools in the project. + +Converting the test cases to the new YAML format can be done manually, and a small +tool can be used to assist with the format migration (for example, to sanitize the +format). This should be a one time task, so no further maintenance is involved. + +# Links + +LAVA Notification Callback : + - https://lava.collabora.co.uk/static/docs/v2/user-notifications.html#notification-callback + +Jenkins Webhook Plugin: + - https://wiki.jenkins.io/display/JENKINS/Webhook+Step+Plugin + +Phabricator API: + - https://phabricator.apertis.org/conduit + +Mattermost Jenkins Plugin: + - https://wiki.jenkins.io/display/JENKINS/Mattermost+Plugin + + +[TestDataStorage]: test-data-storage.md + +[TestDataReporting]: test-data-reporting.md diff --git a/content/designs/clutter.md b/content/designs/clutter.md new file mode 100644 index 0000000000000000000000000000000000000000..3e56113912fc0637dd01f99cee04d988363e5227 --- /dev/null +++ b/content/designs/clutter.md @@ -0,0 +1,454 @@ +--- +title: Clutter and Multitouch +short-description: Issues with Clutter (obsolete) +authors: + - name: Tomeu Vizoso +--- + +# Clutter and Multitouch + +## Introduction + +This document explains Collabora's design about several issues related +to the main UI toolkit used in Apertis: Clutter. + +## Multi-touch + +This section describes the support for multi-touch (MT) events and +gestures in the Apertis middle-ware. It will be explained which +requirements Collabora will support and the general architecture of MT +on Linux and X.Org. + +When we mention MT in this document, we refer to the ability of +applications to react to multiple touch event streams at the same time. +By gestures, we refer to the higher-level abstraction that groups +individual touch events in a single meaningful event. + +At this point of time (Q1 2012), MT and gesture functionality is +implemented in several consumer products but is only starting to be +available in frameworks of general availability such as X.Org and HTML. +As will be explained later, this is reflected in X.Org just being +released with [MT functionality], +Clutter not having MT support in a release yet, and the lack of high level gesture support in X.Org-based +toolkits. In the same vein, MT is not yet standardized in the web. This +design will discuss the challenges posed by these facts and ways of +overcoming them. + +### The multi-touch stack + +For the purposes of this document, the MT stack on X.Org is layered as +follows: + + + +#### Compositor + +The compositor has to be able to react to gestures that may happen +anywhere in the display. The X server usually delivers events to the +window where they happen, so the compositor overrides this behavior by +telling the X server that it has interest in all events regardless of +where they happen (specifically, it does so by registering a passive +grab on the root window). + +The compositor will receive all events and decide for each whether it +should be handled at this level, or let it pass through to the +application to which the underlying window belongs. + +For touch events, this isn't done for individual events but rather for +touch sequences. A touch sequence is the series of touch events that +starts when a finger is placed on the screen, and finished when it is +lifted. Thus, a touch sequence belongs to a single finger, and a gesture +can be composed by as many touch sequences as fingers are involved. + +There is a period of time during which the compositor inspects the touch +events as they come and decides whether it should handle this particular +gesture, or if it should be ignored and passed to the application. If +the compositor decides the later, the events that had been already +delivered to the compositor will be replayed to the application. + +The period of time that the compositor needs to decide whether a gesture +should be handled by applications should be as small as possible, in +order to not disrupt the user experience. + +An application can tell the X server that it doesn't want to have to +wait until the compositor has let the touch sequence pass. In that case, +either the gesture shouldn't cause any visible effects in the UI, or +should be reverted in case the compositor ends up deciding to handle the +touch sequence by itself. + +#### Applications + +Widgets inside applications can react to MT events in a way similar to +how they react to single-touch events. Additionally, some toolkits +provide additional functionality that make it easier to react to +gestures. Widgets can either react to a few predefined gestures (tap, +panning, pinch, etc.), or they can implement their own gesture +recognizers by means of the lower level MT events. + +#### UI toolkits + +As mentioned before, UI toolkits provide API for reacting to MT events +and usually also functionality related to gestures. Because MT is so new +to X.Org, UI toolkits based on X.Org don't implement yet everything that +applications would need regarding MT and gestures, so for now additional +work needs to happen at the application level. + +#### libXi + +This library allow toolkits to communicate with the XInput extension in +the X.Org server. Communication with the X.Org server is asynchronous +and complex, so having a higher level library simplifies this +interaction. + +#### X.Org server + +The X.Org server delivers input events to each application based on the +coordinates of the event and the placement of the application windows. + +#### evdev X.Org input driver + +This input driver for X.Org uses udev to discover devices and evdev to +get input events from the kernel and posts them to the X.Org server. + +If it is needed to apply a jitter-reduction filter and it's impossible +to do so in the kernel device driver, then we recommend to patch the +evdev X.Org input driver. + +#### MTDev + +For device drivers that use the legacy MT protocol as opposed to the new +slots protocol, the X.Org input driver will use libmtdev to translate +events from the old protocol (type A) to the new one (type B). + +> See <http://www.kernel.org/doc/Documentation/input/multi-touch-protocol.txt> + +#### Kernel event driver (evdev) + +This kernel module will emit input events in a protocol that the evdev +X.Org input driver can understand. + +#### Kernel device driver + +This kernel module is hardware-dependent and will interface with the +hardware and pass input events to the evdev event driver. + +### Requirements + +#### Multitouch event handing in Clutter applications + +Clutter will have APIs for reacting to multi-touch input events and for +recognizing gestures. + +The Apertis middleware will have the support that Clutter requires to +provide MT and gestures functionality, as described later in this +document. + +Though it is expected that Clutter will eventually provide support for +recognizing a few basic gestures such as taps and panning, more advanced +gestures will have to be implemented outside Clutter. In the Apertis +case, recognizing additional gestures will have to be done by the +applications themselves or by the SDK API. + +New gestures will be developed using the gesture framework in Clutter, +regardless of whether the gesture recognition is implemented in the SDK, +compositor or applications. In other words, new gestures can be +developed making use of the Clutter API, but the code that implements +them can belong to applications, compositors or libraries. No +modifications to Clutter are required to implement new gestures. + +#### Full-screen event handing in applications + +Applications should be able to handle events anywhere in the screen even +if their windows don't cover the whole of it. For example, there may be +windows belonging to the system such as a launcher panel or a status bar +in the screen, but a gesture that starts in one of the auxiliary windows +should be handled by the focused application. + +#### Multi-touch event handling in Mutter + +The compositor based on Mutter will be able to react to multi-touch +input events and recognize gestures using the same Clutter API as +applications. + +The compositor will be able to register for MT sequences before +applications get them, so it can claim ownership over them in case a +system-wide gesture is detected. Even then, applications will be able to +get all events that happen in their windows though care needs to be +taken in case the compositor ends up claiming ownership. + +#### Multitouch event handing in web applications + +Although there are no approved specifications yet on how browsers should +expose MT events to web content, some browsers have started already to +provide experimental APIs and some websites are using them. Most notable +are the Safari browser on iOS and websites that target specifically +iPhone and iPad devices, though nowadays other WebKit-based browsers +implement MT events and more websites are starting to use them. + +A [spec][w3-touch-spec] is being drafted by the W3C on base on the WebKit +implementation, but attention must be paid to the fact that because it's +still a draft, it may change in ways that are incompatible with WebKit's +implementation and the later may not always be in sync with the spec. + +#### Support two separate touchscreens + +The Apertis middleware will be able to drive two screens, each with +multi-touch support. + +#### Support out-of-screen touch events + +The reactive area of the touchscreen may extend past the display and the +user will be able to interact with out-of-screen UI elements. + +#### Actors with bigger reactive area + +So the UI is more tolerant to erratic touch events (caused for example +by a bumpy road), some actors will be reactive past the boundaries of +their representation in the screen. + +### Approach + +#### Multitouch event handing in Clutter applications + +MT support in X.Org is very recent, so Collabora will have to update +several components (mainly belonging to X) in the stack because Precise +is not going to ship with the versions that are needed. + +Clutter 1.8 had support for single-touch gestures. In 1.10, support for +multi-touch event handling landed, and it is expected that for 1.12 +(August 2012), support for multi-touch gestures will be added. It's also +planned to have support for rotation and pinch gestures in +[Clutter 1.12]. + +#### Full-screen event handing in applications + +In order for applications to be able to handle events anywhere in the +screen even if their windows do not cover the whole of it, applications +will have to set grabs on the other visible windows, even if they don't +belong to the application process. + +So in the case that the currently-focused application is displayed along +with a launcher panel and a status bar, the application process should +set a grab on those windows for the events that it is interested in. +When another application becomes focused, the first one releases the +grab and the second takes it. + +In order to make sure that the second application takes the grab as soon +as possible, it should try calling XIGrabTouchBegin repeatedly until it +stops failing with BadAccess (this will happen once the previously +focused application has removed its grab). + +So the compositor can still handle system-level gestures as explained in +2.3.3, it will have to set a grab on an invisible window that is the +parent of each additional window. + +This is so complex because this scenario is a bit removed from the +design of event delivery in X. In Wayland, as the compositor is also the +display server and has total control over event delivery, it could +redirect events to application windows instead of to the panels. + +#### Multi-touch event handling in Mutter + +Current releases of Mutter use XLib for event management, which doesn't +support multi-touch. To Collabora's knowledge, there aren't as of yet +any Mutter-based products that make use of system-wide multi-touch +gestures. + +> See <https://wiki.gnome.org/GnomeOS/Design/Whiteboards/Touchscreen#Mutter_problems> + +Collabora has [modified][tomeu-multitouch] +Mutter to allow plugins to register for touch events and to pass them to Clutter so subclasses of +ClutterGestureAction can be used. + +Though applications are able to start receiving MT events even before +the compositor rejects ownership, Collabora recommends that applications +don't do that and instead that all efforts are directed towards having +the compositor recognize gestures as fast as possible. + +Otherwise, it will be very hard to avoid glitches in the user experience +when the compositor decides to handle a gesture and the application +already started to react to it. + +By limiting system-wide gestures to those with 4 or 5 fingers as iOS 5 +does (what Apple calls *multitasking gestures*), the compositor should +be able to decide whether to take ownership of a MT sequence with a +minimal delay and without applications having to do anything on their +side. Gestures with less than 4 fingers will be considered as intended +for applications and the compositor will decline ownership immediately. + +This diagram illustrates how the different components interact when +handling touch events: + + + +If there is a window that gets placed on top of the others +(specifically, the “busy†indicator animation) and it shouldn't get any +events, it can be marked as such with the XShapeCombineRegion call, +passing an empty region and the ShapeInput destKind. + +> See <http://article.gmane.org/gmane.comp.kde.devel.kwin/19992> + +This way, any events that the compositor doesn't intercept will be delivered to the +currently-focused application. + +#### Multi-touch event handing in web applications + +Collabora will implement the port specific bits of MT in WebKit-Clutter +to ensure that it has state-of-the-art WebKit MT support, but won't be +changing the behavior of the generic WebKit implementation nor working +on the specification level. + +Collabora won't be implementing any high-level gesture support because +they are far from being specified and its support in browsers is very +experimental. + +#### Support two separate touchscreens + +There is support in X.Org for arbitrarily matching input devices to +screens, though the configuration [isn't straightforward][xorg-transfo-matrix]. +Collabora will test the simultaneous support of 2 touch-screens and produce +documentation about how to configure X.Org in this regard during the +development phase of the project. + +#### Support out-of-screen touch events + +Regarding out-of-screen events support, we propose writing a daemon that +listens for events from the touchscreen kernel driver and that, based on +its configuration, translates those to key presses for special key codes +that correspond to each button. + +As the X evdev driver will also get those events, it has to be +configured to ignore touch events outside the screen area. + +In the SDK, a wrapper around Xephyr will be provided that synthesizes +key events when buttons around the Xephyr screen are pressed, simulating +the ones in the real hardware. + +#### Actors with bigger reactive area + +Clutter actors tell the Clutter framework in which area of the screen +they are sensitive to pointer events. This area usually matches the area +that the actor uses to display itself, but the actor could choose to +mark a bigger area as its reactive area. + +Though the Clutter maintainer has recommended this approach, he warns +that the optimization that culls actors based on their paint volumes +might get in the way in this case. + +Collabora will verify that this works and communicate with the upstream +community in case any problem is discovered. + +### Risks + +The [DDX] driver provided by the hardware vendor should support +having a frame-buffer that is bigger than the actual display resolution, +for the out-of-screen touch events. + +## Smooth panning + +This section proposes an improvement to the kinetic scrolling +functionality in Mx so that panning is smooth even when the input +device's resolution is very low. This is going to affect only to Clutter +applications that use MxKineticScrollView as the container of scrollable +areas. + +The problem with input devices with low resolution is that as the finger +moves during panning, the motion events received by the application are +similarly low-resolution (i.e., occurring infrequently). Given that Mx +currently updates the position in the scrolled view to match the +position of the finger, the panning movement appears "jumpy". + +### Requirements + +When panning, the position in the scrolled view will smoothly +interpolate to the last finger position, instead of jumping there +straight away. The visual effect would be that of the scroll position +lagging slightly behind the finger. The lower the resolution of the +touch screen, the bigger the lag. + +### Approach + +Collabora will rewrite the part of the [MxKineticScrollView] +widget that tracks the finger position when panning, ideally using the function +[mx_adjustment_interpolate] to animate the movement along the path between +the current position and the finger position. The time function ([ClutterAlpha]) +used in the interpolation animation will be configurable, as well as the speed at which the scrolling position will +follow the finger. + +### Risks + +Upstream could decide to reject this feature when it is proposed for +inclusion because of the substantial added complexity to a widget +(MxKineticScrollView) that is already pretty complex. However, +preliminary discussions with the Mx maintainers show that they are +interested in gaining this functionality. + +Another risk is Intel not funding all the work in Clutter that it has +committed to. In that case, Collabora may need to do the work. + +## Documentation + +The following items have been identified for future documentation work +later in the project: + + - Add a section to the Clutter Cookbook about implementing a + ClutterGestureAction subclass. + + - Best practices about MT gestures in user experience, both + system-wide and application gestures. Compare these guidelines with + the equivalent ones in iOS and Android. + + - Best practices about performance with Clutter and Mx (including how + to write containers which are responsive independently of the number + of children and how to drag actors across the screen). + + - Best practices about using and writing reusable UI components + (including Mx widgets), and explicitly these subjects (to be + specified further at a later stage): + + - Panning actors + + - Finger moves outside the “active area†of an actor (e.g., moving button of timeline in video very fast) + + - Snap forward/backward to final position + + - Expand/shrink of groups (sliding) + + - Best practices about writing applications in which functionality and + UI are separate, so derivatives can be written by replacing the UI + and without having to modify the rest of the application. + +### Design notes + +The following items have been identified for future investigation and +design work later in the project and are thus not addressed in this +design: + + - Support two separate touchscreens + + - Support out-of-screen touch events + + - Implement jitter reduction (there already has an algorithm), taking + into account that the input driver may be a binary blob + + - Implement zoom via pitch gestures in Google Earth without having + access to its source + code + +[MT functionality]: http://lists.x.org/archives/xorg-announce/2012-March/001846.html + +[w3-touch-spec]: http://dvcs.w3.org/hg/webevents/raw-file/tip/touchevents.html + +[Clutter 1.12]: http://wiki.clutter-project.org/wiki/ClutterRoadMap#1.12 + +[tomeu-multitouch]: http://blog.tomeuvizoso.net/2012/09/multi-touch-gestures-in-gnome-shell.html + +[xorg-transfo-matrix]: http://www.x.org/wiki/XInputCoordinateTransformationMatrixUsage + +[DDX]: http://dri.freedesktop.org/wiki/DDX + +[MxKineticScrollView]: http://docs.clutter-project.org/docs/mx/stable/MxKineticScrollView.html + +[mx_adjustment_interpolate]: http://docs.clutter-project.org/docs/mx/stable/MxAdjustment.html#mx-adjustment-interpolate + +[ClutterAlpha]: http://developer.gnome.org/clutter/stable/ClutterAlpha.html diff --git a/content/designs/coding_conventions.md b/content/designs/coding_conventions.md new file mode 100644 index 0000000000000000000000000000000000000000..cc7617333ce1fa3220f8aa5fa481ef955d191c0d --- /dev/null +++ b/content/designs/coding_conventions.md @@ -0,0 +1,512 @@ +# Coding conventions + +Coding conventions is a nebulous topic, covering code formatting and whitespace, +function and variable naming, namespacing, use of common GLib coding patterns, +and other things. Since C is quite flexible, this document mostly consists of a +series of patterns (which it’s recommended code follows) and anti-patterns +(which it’s recommended code does **not** follow). Any approaches to coding +which are not covered by a pattern or anti-pattern are completely valid. + +Guidelines which are specific to GLib are included on this page; guidelines +specific to other APIs are covered on their respective pages. + +## Summary + +* Use the GLib coding style, with vim modelines. + ([Code formatting](#code-formatting)) +* Consistently namespace files, functions and types. + ([Namespacing](#namespacing)) +* Always design code to be modular, encapsulated and loosely coupled. + ([Modularity](#modularity)) + * Especially by keeping object member variables inside the object’s private + structure. +* Code defensively by adding pre- and post-conditions assertions to all public + functions. + ([Pre- and post-condition assertions](#pre-and-postcondition-assertions)) +* Report all user errors (and no programmer errors) using GError. + ([GError usage](#gerror-usage)) +* Use appropriate container types for sets of items. ([GList](#glist)) +* Document all constant values used in the code. + ([Magic values](#magic-values)) +* Use standard GLib patterns for defining asynchronous methods. + ([Asynchronous methods](#asynchronous-methods)) +* Do not call any blocking, synchronous functions. + ([Asynchronous methods](#asynchronous-methods)) +* Do not run blocking operations in separate threads; use asynchronous calls + instead. ([Asynchronous methods](#asynchronous-methods)) +* Prefer enumerated types over booleans whenever there is the potential for + ambiguity between true and false. + ([Enumerated types and booleans](#enumerated-types-and-booleans)) +* Ensure GObject properties have no side-effects. + ([GObject properties](#gobject-properties)) +* Treat resources as heap-allocated memory and do not leak them. + ([Resource leaks](#resource-leaks)) + +## Code formatting + +Using a consistent code formatting style eases maintenance of code, by meaning +contributors only have to learn one coding style for all modules, rather than +one per module. + +The coding style in use is the popular +[GLib coding style](https://developer.gnome.org/programming-guidelines/unstable/c-coding-style.html.en) +, which is a slightly modified version of the +[GNU coding style](http://www.gnu.org/prep/standards/standards.html#Writing-C). + +Each C and H file should have a vim-style modeline, which lets the programmer’s +editor know how code in the file should be formatted. This helps keep the coding +style consistent as the files evolve. The following modeline should be put as +the very first line of the file, immediately before the +[copyright comment](license-applying.md#licensing-of-code): + +--- +/* vim:set et sw=2 cin cino=t0,f0,(0,{s,>2s,n-s,^-s,e2s: */ +--- + +For more information about the copyright comment, see +[Applying Licensing](https://designs.apertis.org/latest/license-applying.html). + +### Reformatting code + +If a file or module does not conform to the code formatting style and needs to +be reindented, the following command will do most of the work — but it can go +wrong, and the file **must** be checked manually afterwards: + +--- +indent -gnu -hnl -nbbo -bbb -sob -bad -nut /path/to/file +--- + +To apply this to all C and H files in a module: + +--- +git ls-files '*.[ch]' | \ +xargs indent -gnu -hnl -nbbo -bbb -sob -bad -nut +--- + +Alternatively, if you have a recent enough version of Clang (>3.5): + +--- +git ls-files '*.[ch]' | \ +xargs clang-format -i -style=file +--- + +Using a `.clang-format` file (added to git) in the same directory, containing: + +--- +# See https://designs.apertis.org/latest/coding_conventions.html#code-formatting +BasedOnStyle: GNU +AlwaysBreakAfterDefinitionReturnType: All +BreakBeforeBinaryOperators: None +BinPackParameters: false +SpaceAfterCStyleCast: true +# Our column limit is actually 80, but setting that results in clang-format +# making a lot of dubious hanging-indent choices; disable it and assume the +# developer will line wrap appropriately. clang-format will still check +# existing hanging indents. +ColumnLimit: 0 +--- + +## Memory management + +See [Memory management](https://wiki.apertis.org/Guidelines/Memory_management) +for some patterns on handling memory management; particularly +[single path cleanup](https://wiki.apertis.org/Guidelines/Memory_management#Single-path_cleanup). + +## Namespacing + +Consistent and complete namespacing of symbols (functions and types) and files +is important for two key reasons: + +1. Establishing a convention which means developers have to learn fewer symbol + names to use the library — they can guess them reliably instead. +1. Ensuring symbols from two projects do not conflict if included in the same + file. + +The second point is important — imagine what would happen if every project +exported a function called `create_object()`. The headers defining them could +not be included in the same file, and even if that were overcome, the programmer +would not know which project each function comes from. Namespacing eliminates +these problems by using a unique, consistent prefix for every symbol and +filename in a project, grouping symbols into their projects and separating them +from others. + +The conventions below should be used for namespacing all symbols. They are the +[same as used in other GLib-based projects](https://developer.gnome.org/gobject/stable/gtype-conventions.html), +so should be familiar to a lot of developers: + +* Functions should use `lower_case_with_underscores`. +* Structures, types and objects should use `CamelCaseWithoutUnderscores`. +* Macros and #defines should use `UPPER_CASE_WITH_UNDERSCORES`. +* All symbols should be prefixed with a short (2–4 characters) version of the + namespace. +* All methods of an object should also be prefixed with the object name. + +Additionally, public headers should be included from a subdirectory, effectively +namespacing the header files. For example, instead of `#include <abc.h>`, a +project should allow its users to use `#include <namespace/ns-abc.h>` + +For example, for a project called ‘Walbottle’, the short namespace ‘Wbl’ would +be chosen. If it has a ‘schema’ object and a ‘writer’ object, it would install +headers: +* `$PREFIX/include/walbottle-$API_MAJOR/walbottle/wbl-schema.h` +* `$PREFIX/include/walbottle-$API_MAJOR/walbottle/wbl-writer.h` + +(The use of `$API_MAJOR` above is for +[parallel installability](https://wiki.apertis.org/Guidelines/Module_setup#Parallel_installability).) + +For the schema object, the following symbols would be exported (amongst others), +following GObject conventions: +* `WblSchema` structure +* `WblSchemaClass` structure +* `WBL_TYPE_SCHEMA` macro +* `WBL_IS_SCHEMA` macro +* `wbl_schema_get_type` function +* `wbl_schema_new` function +* `wbl_schema_load_from_data` function + +## Modularity + +[Modularity](http://en.wikipedia.org/wiki/Modular_programming)Modularity], +[encapsulation](http://en.wikipedia.org/wiki/Encapsulation_%28object-oriented_programming%29) +and [loose coupling](http://en.wikipedia.org/wiki/Loose_coupling) are core +computer science concepts which are necessary for development of maintainable +systems. Tightly coupled systems require large amounts of effort to change, due +to each change affecting a multitude of other, seemingly unrelated pieces of +code. Even for smaller projects, good modularity is highly recommended, as these +systems may grow to be larger, and refactoring for modularity takes a lot of +effort. + +Assuming the general concepts of modularity, encapsulation and loose coupling +are well known, here are some guidelines for implementing them which are +specific to GLib and GObject APIs: + +1. The private structure for a GObject should not be in any header files + (whether private or public). It should be in the C file defining the object, + as should all code which implements that structure and mutates it. +1. libtool convenience libraries should be used freely to allow internal code + to be used by multiple public libraries or binaries. However, libtool + convenience libraries must not be installed on the system. Use + `noinst_LTLIBRARIES` in `Makefile.am` to declare a convenience library; not + `lib_LTLIBRARIES`. +1. Restrict the symbols exported by public libraries by using + `my_library_LDFLAGS = -export-symbols my-library.symbols`, where + `my-library.symbols` is a text file listing the names of the functions to + export, one per line. This prevents internal or private functions from being + exported, which would break encapsulation. See + [Exposing and Hiding Symbols](https://autotools.io/libtool/symbols.html). +1. Do not put any members (e.g. storage for object state or properties) in a + public GObject structure — they should all be encapsulated in a private + structure declared using + [`G_DEFINE_TYPE_WITH_PRIVATE`](https://developer.gnome.org/gobject/stable/gobject-Type-Information.html#G-DEFINE-TYPE-WITH-PRIVATE:CAPS). +1. Do not use static variables inside files or functions to preserve function + state between calls to it. Instead, store the state in an object (e.g. the + object the function is a method of) as a private member variable (in the + object’s private structure). Using static variables means the state is + shared between all instances of the object, which is almost always + undesirable, and leads to confusing behaviour. + +## Pre- and post-condition assertions + +An important part of secure coding is ensuring that incorrect data does not +propagate far through code — the further some malicious input can propagate, the +more code it sees, and the greater potential there is for an exploit to be +possible. + +A standard way of preventing the propagation of invalid data is to check all +inputs to, and outputs from, all publicly visible functions in a library or +module. There are two levels of checking: + +* Assertions: Check for programmer errors and abort the program on failure. +* Validation: Check for invalid input and return an error gracefully on failure. + +Validation is a complex topic, and is handled using +[GErrors](#gerror-usage). The remainder of this section discusses pre- and +post-condition assertions, which are purely for catching programmer errors. A +programmer error is where a function is called in a way which is documented as +disallowed. For example, if `NULL` is passed to a parameter which is documented +as requiring a non-`NULL` value to be passed; or if a negative value is passed +to a function which requires a positive value. Programmer errors can happen on +output too — for example, returning `NULL` when it is not documented to, or not +setting a GError output when it fails. + +Adding pre- and post-condition assertions to code is as much about ensuring the +behaviour of each function is correctly and completely documented as it is about +adding the assertions themselves. All assertions should be documented, +preferably by using the relevant +[gobject-introspection annotations](https://wiki.gnome.org/Projects/GObjectIntrospection/Annotations), +such as `(nullable)`. + +Pre- and post-condition assertions are implemented using +[`g_return_if_fail()`](https://developer.gnome.org/glib/stable/glib-Warnings-and-Assertions.html#g-return-if-fail) +--- +[`g_return_val_if_fail()`](https://developer.gnome.org/glib/stable/glib-Warnings-and-Assertions.html#g-return-val-if-fail). + +The pre-conditions should check each parameter at the start of the function, +before any other code is executed (even retrieving the private data structure +from a GObject, for example, since the GObject pointer could be `NULL`). The +post-conditions should check the return value and any output parameters at the +end of the function — this requires a single return statement and use of `goto` +to merge other control paths into it. See +[Single-path cleanup](https://wiki.apertis.org/Guidelines/Memory_management#Single-path_cleanup) +for an example. + +A fuller example is given in this +[writeup of post-conditions](https://tecnocode.co.uk/2010/12/19/postconditions-in-c/). + +## GError usage + +[`GError`](https://developer.gnome.org/glib/stable/glib-Error-Reporting.html) +is the standard error reporting mechanism for GLib-using code, and can be +thought of as a C implementation of an +[exception](http://en.wikipedia.org/wiki/Exception_handling exception). + +Any kind of runtime failure (anything which is not a +[programmer error](#pre-and-postcondition-assertions)) must be handled by +including a `GError**` parameter in the function, and setting a useful and +relevant GError describing the failure, before returning from the function. +Programmer errors must not be handled using GError: use assertions, +pre-conditions or post-conditions instead. + +GError should be used in preference to a simple return code, as it can convey +more information, and is also supported by all GLib tools. For example, +introspecting an API with +[GObject introspection](https://wiki.gnome.org/Projects/GObjectIntrospection) +will automatically detect all GError parameters so that they can be converted +to exceptions in other languages. + +Printing warnings to the console must not be done in library code: use a GError, +and the calling code can propagate it further upwards, decide to handle it, or +decide to print it to the console. Ideally, the only code which prints to the +console will be top-level application code, and not library code. + +Any function call which can take a `GError**`, **should** take such a parameter, +and the returned GError should be handled appropriately. There are very few +situations where ignoring a potential error by passing `NULL` to a `GError**` +parameter is acceptable. + +The GLib API documentation contains a +[full tutorial for using GError](https://developer.gnome.org/glib/stable/glib-Error-Reporting.html#glib-Error-Reporting.description). + +## GList + +GLib provides several container types for sets of data: + +- [`GList`](https://developer.gnome.org/glib/stable/glib-Doubly-Linked-Lists.html) +- [`GSList`](https://developer.gnome.org/glib/stable/glib-Singly-Linked-Lists.html) +- [`GPtrArray`](https://developer.gnome.org/glib/stable/glib-Pointer-Arrays.html) +- [`GArray`](https://developer.gnome.org/glib/stable/glib-Arrays.html) + +It has been common practice in the past to use GList in all situations where a +sequence or set of data needs to be stored. This is inadvisable — in most +situations, a GPtrArray should be used instead. It has lower memory overhead (a +third to a half of an equivalent list), better cache locality, and the same or +lower algorithmic complexity for all common operations. The only typical +situation where a GList may be more appropriate is when dealing with ordered +data, which requires expensive insertions at arbitrary indexes in the array. + +[Article on linked list performance](http://www.codeproject.com/Articles/340797/Number-crunching-Why-you-should-never-ever-EVER-us) + +If linked lists are used, be careful to keep the complexity of operations on +them low, using standard CS complexity analysis. Any operation which uses +[`g_list_nth()`](https://developer.gnome.org/glib/2.30/glib-Doubly-Linked-Lists.html#g-list-nth) +or +[`g_list_nth_data()`](https://developer.gnome.org/glib/2.30/glib-Doubly-Linked-Lists.html#g-list-nth-data) +is almost certainly wrong. For example, iteration over a GList should be +implemented using the linking pointers, rather than a incrementing index: + +--- +GList *some_list, *l; + +for (l = some_list; l != NULL; l = l->next) +--- + gpointer element_data = l->data; + + /* Do something with @element_data. */ +--- +--- + +Using an incrementing index instead results in an exponential decrease in +performance (O(2×N^2) rather than O(N)): + +--- +GList *some_list; +guint i; + +/* This code is inefficient and should not be used in production. */ +for (i = 0; i < g_list_length (some_list); i++) +--- + gpointer element_data = g_list_nth_data (some_list, i); + + /* Do something with @element_data. */ +--- +--- + +The performance penalty comes from `g_list_length()` and `g_list_nth_data()` +which both traverse the list (O(N)) to perform their operations. + +Implementing the above with a GPtrArray has the same complexity as the first +(correct) GList implementation, but better cache locality and lower memory +consumption, so will perform better for large numbers of elements: + +--- +GPtrArray *some_array; +guint i; + +for (i = 0; i < some_array->len; i++) +--- + gpointer element_data = some_array->pdata[i]; + + /* Do something with @element_data. */ +--- +--- + +## Magic values + +Do not use constant values in code without documenting them. These values can be +known as ‘magic’ values, because it is not clear how they were chosen, what they +depend on, or when they need to be updated. + +Magic values should be: + +* defined as macros using `#define`, rather than being copied to every usage + site; +* all defined in an easy-to-find-location, such as the top of the source code + file; and +* documented, including information about how they were chosen, and what that + choice depended on. + +One situation where magic values are used incorrectly is to circumvent the type +system. For example, a magic string value which indicates a special state for a +string variable. Magic values should not be used for this, as the software state +could then be corrupted if user input includes that string (for example). +Instead, a separate variable should be used to track the special state. Use the +type system to do this work for you — magic values should never be used as a +basic dynamic typing system. + +## Asynchronous methods + +Long-running blocking operations should not be run such that they block the UI +in a graphical application. This happens when one iteration of the UI’s main +loop takes significantly longer than the frame refresh rate, so the UI is not +refreshed when the user expects it to be. Interactivity reduces and animations +stutter. In extreme cases, the UI can freeze entirely until a blocking operation +completes. This should be avoided at all costs. + +Similarly, in non-graphical applications that respond to network requests or +[D-Bus inter-process communication](https://wiki.apertis.org/Guidelines/D-Bus_services), +blocking the main loop prevents the next request from being handled. + +There are two possible approaches for preventing the main loop being blocked: + +1. Running blocking operations asynchronously in the main thread, using polled + I/O. +1. Running blocking operations in separate threads, with the main loop in the + main thread. + +The second approach (see [[Guidelines/Threading|Threading]]) typically leads to +complex locking and synchronisation between threads, and introduces many bugs. +The recommended approach in GLib applications is to use asynchronous operations, +implemented using [`GTask`](https://developer.gnome.org/gio/stable/GTask.html) +and [`GAsyncResult`](https://developer.gnome.org/gio/stable/GAsyncResult.html). +Asynchronous operations must be implemented everywhere for this approach to +work: any use of a blocking, synchronous operation will effectively make all +calling functions blocking and synchronous too. + +The documentation for +[`GTask`](https://developer.gnome.org/gio/stable/GTask.html) +and [`GAsyncResult`](https://developer.gnome.org/gio/stable/GAsyncResult.html) +include examples and tutorials for implementing and using GLib-style +asynchronous functions. + +Key principles for using them: + +1. Never call synchronous methods: always use the `*_async()` and `*_finish()` + variant methods. +1. Never use threads for blocking operations if an asynchronous alternative + exists. +1. Always wait for an asynchronous operation to complete (i.e. for its + `GAsyncReadyCallback` to be invoked) before starting operations which depend + on it. + * Never use a timeout (`g_timeout_add()`) to wait until an asynchronous + operation ‘should’ complete. The time taken by an operation is + unpredictable, and can be affected by other applications, kernel + scheduling decisions, and various other system processes which cannot be + predicted. + +## Enumerated types and booleans + +In many cases, enumerated types should be used instead of booleans: +1. Booleans are not self-documenting in the same way as enums are. When reading + code it can be easy to misunderstand the sense of the boolean and get things + the wrong way round. +1. They are not extensible. If a new state is added to a property in future, + the boolean would have to be replaced — if an enum is used, a new value + simply has to be added to it. + +This is documented well in the article +[Use Enums Not Booleans](http://c2.com/cgi/wiki?UseEnumsNotBooleans). + +## GObject properties + +[Properties on GObjects](https://developer.gnome.org/gobject/stable/gobject-properties.html) +are a key feature of GLib-based object orientation. Properties should be used to +expose state variables of the object. A guiding principle for the design of +properties is that (in pseudo-code): + +--- +var temp = my_object.some_property +my_object.some_property = "new value" +my_object.some_property = temp +--- + +should leave `my_object` in exactly the same state as it was originally. +Specifically, properties should **not** act as parameterless methods, triggering +state transitions or other side-effects. + +## Resource leaks + +As well as [memory leaks](https://wiki.apertis.org/Guidelines/Memory_management), +it is possible to leak resources such as GLib timeouts, open file descriptors +or connected GObject signal handlers. Any such resources should be treated using +the same principles as allocated memory. + +For example, the source ID returned by +[`g_timeout_add()`](https://developer.gnome.org/glib/stable/glib-The-Main-Event-Loop.html#g-timeout-add) +must always be stored and removed (using +[`g_source_remove()`](https://developer.gnome.org/glib/stable/glib-The-Main-Event-Loop.html#g-source-remove)) +when the owning object is finalised. This is because it is very rare that we can +guarantee the object will live longer than the timeout period — and if the +object is finalised, the timeout left uncancelled, and then the timeout +triggers, the program will typically crash due to accessing the object’s memory +after it’s been freed. + +Similarly for signal connections, the signal handler ID returned by +[`g_signal_connect()`](https://developer.gnome.org/gobject/stable/gobject-Signals.html#g-signal-connect) +should always be saved and explicitly disconnected +([`g_signal_handler_disconnect()`](https://developer.gnome.org/gobject/stable/gobject-Signals.html#g-signal-handler-disconnect)) +unless the object being connected is guaranteed to live longer than the object +being connected to (the one which emits the signal): + +Other resources which can be leaked, plus the functions acquiring and releasing +them (this list is non-exhaustive): +* File descriptors (FDs): + * [`g_open()`](https://developer.gnome.org/glib/stable/glib-File-Utilities.html#g-open) + * [`g_close()`](https://developer.gnome.org/glib/stable/glib-File-Utilities.html#g-close) +* Threads: + * [`g_thread_new()`](https://developer.gnome.org/glib/stable/glib-Threads.html#g-thread-new) + * [`g_thread_join()`](https://developer.gnome.org/glib/stable/glib-Threads.html#g-thread-join) +* Subprocesses: + * [`g_spawn_async()`](https://developer.gnome.org/glib/stable/glib-Spawning-Processes.html#g-spawn-async) + * [`g_spawn_close_pid()`](https://developer.gnome.org/glib/stable/glib-Spawning-Processes.html#g-spawn-close-pid) +* D-Bus name watches: + * [`g_bus_watch_name()`](https://developer.gnome.org/gio/stable/gio-Watching-Bus-Names.html#g-bus-watch-name) + * [`g_bus_unwatch_name()`](https://developer.gnome.org/gio/stable/gio-Watching-Bus-Names.html#g-bus-unwatch-name) +* D-Bus name ownership: + * [`g_bus_own_name()`](https://developer.gnome.org/gio/stable/gio-Owning-Bus-Names.html#g-bus-own-name) + * [`g_bus_unown_name()`](https://developer.gnome.org/gio/stable/gio-Owning-Bus-Names.html#g-bus-unown-name) + +## External links + +* [Pre- and post-condition assertions article](https://tecnocode.co.uk/2010/12/19/postconditions-in-c/) +* [Exposing and hiding library symbols article](https://autotools.io/libtool/symbols.html) +* [GError tutorial](https://developer.gnome.org/glib/stable/glib-Error-Reporting.html#glib-Error-Reporting.description) diff --git a/content/designs/connectivity-documentation.md b/content/designs/connectivity-documentation.md new file mode 100644 index 0000000000000000000000000000000000000000..56b977a0a4f1790dfd8bab072af61d5287432b19 --- /dev/null +++ b/content/designs/connectivity-documentation.md @@ -0,0 +1,483 @@ +--- +title: Connectivity documentation +short-description: Collection of connectivity resources (general-design) +authors: + - name: Gustavo Padovan +--- + +# Connectivity documentation + +## Writing ConnMan plugins + +The plugin documentation in ConnMan was improved and submitted upstream. +The documentation about writing plugins ca be found on ConnMan sources +in the following files: *doc/plugin-api.txt*, *src/device.c* and +*src/network.c*. Example plugins are plugins/bluetooth.c plugins/wifi.c, +plugins/ofono.c, among others. + +## Customs ConnMan Session policies + +The documentation to create Session policies files for specifics users +and/or groups can be found in ConnMan sources +*doc/session-policy-format.txt*. The policies files shall be placed in +`STORAGEDIR/session_policy_local` directory, where STORAGEDIR by default +points to /var/lib/connman. ConnMan can recognize changes to this +directory during runtime and update Session policies accordingly. + +## Management of ConnMan Sessions + +ConnMan provides a extensive API to manage the creation, configuration +and removal of a session, *doc/manager-api.txt* details how to create +and destroy a Session through the CreateSession() and DestroySession() +methods. *doc/session-api.txt* details how to use a Session. Through +this API an application can ask ConnMan to Connect/Disconnect a Session +or change its settings. The Settings can also be changed by writing +policies files as described in the previous topic. + +The application requesting a Session needs to implement a Notification +API to receive updates in the Session settings, such as when a Session +becomes online. This is done via the Update() method. + +See also *doc/session-overview.txt*. + +The difference between using the Session API and the policy files in +/var/lib/connman is that policy files can set policies to many sessions +at the same time, based on user/group ID or SELINUX rules while Session +API only changes one session at a time. + +## WiFi radio start up behavior on ConnMan + +At the very first run ConnMan has the WiFi radio disabled by default, +however sometimes it is important to have the radio enabled even in the +first ConnMan run. To achieve this behavior ConnMan can be configured to +enable the radio on it first run. + +The file STORAGEDIR/settings, where STORAGEDIR by default points to +/var/lib/connman, shall be edited, or even created, to have the +following content: + +--- +[WiFi] + +Enable=true +--- + +This configuration will tell ConnMan at start up to enable the WiFi +radio. + +## Supporting new data modems in oFono + +oFono has a great support for most of the modems out there in the +market, however some new modem may not work out-of-the-box, in this case +we need to fix oFono to recognize and handle the new modem properly. +There are a couple of different causes why a modem does not work with +oFono. In this section we will detail them and show how oFono can be +fixed. + + - Modem match failure: if the udevng plugin in oFono fails to match + the new modem its code needs to be fixed to recognize the new modem. + This kind of failure can be recognized by looking at the debug + output of the udevng plugin (debug output is enabled when running + ofonod with the '-d' option). If udevng doesn't say anything about + the new modem then it needs proper code to handle it. You can find a + example on how to edit plugins/udevng.c to support a new modem in + [oFono git](https://git.kernel.org/cgit/network/ofono/ofono.git/commit/?id=4cabdedafdc241706e342720a20bdfe3828dfadf). + The oFono git history has many examples of patches + to add support to new modems in plugins/udevng.c + + - Some other modems does not implement the specifications properly and + thus oFono needs to implement 'quirks' to have these modems working + properly. Many examples of fixes can be found on oFono git: + + - <https://git.kernel.org/cgit/network/ofono/ofono.git/commit/?id=d1ac1ba3d474e56593ac3207d335a4de3d1f4a1d> + + - <https://git.kernel.org/cgit/network/ofono/ofono.git/commit/?id=535ff69deddda292c7047620dc11336dfb480a0d> + +It is difficult to foresee the problems that can happen when trying a +new modem due to the extensive number of commands and specifications +oFono implements. Asking the [oFono community] +could be very helpful to solve any issue with a new modem. + +## Writing new Telepathy Connection Managers + +New connection managers are implemented as separated component and have +their own process. Telepathy defines the [D-Bus interfaces][telepathy-dbus] +that each Connection Manager (CM) needs to implement. This is known as the +Telepathy Specification. + +The Connection Managers need to expose a bus name in D-Bus that begins +with *org.freedesktop.Telepathy.ConnectionManager,* for example, the +telepathy-gabble CM, has the +*org.freedesktop.Telepathy.ConnectionManager.gabble* bus name to provide +its XMPP protocol interfaces. + +A client that wants to talk to the available Connection Managers in the +D-Bus Session bus needs to call D-Bus *ListActivatableNames* method and +search for names with the returned prefix. + +The most important Interfaces that a Connection Manager needs to +implement are *ConnectionManager*, *Connection* and *Channel*. The +*ConnectionManager* handles creation and destruction of *Connection* +object. A *Connection* object represents a connected protocol session, +such as a XMPP session. Within a *Connection* many *Channel* objects can +be created; they are used for communication between the application and +the server providing the protocol service. A *Channel* can represent +many different types of communications such as files transfers, incoming +and outcoming messages, contact search, etc. + +Another important concept is the [Handle][telepathy-handle]. +It is basically a numeric ID to represent various protocol resources, such as contacts, +chatrooms, contact lists and user-defined groups. + +The [Telepathy Developer's Manual] +details how to use the Telepathy API and thus gives many suggestions of how those should be implemented +by a new Connection Manager. + +Studying the code of existing Connection Managers is informative when +implementing a new one. Two good examples are [telepathy-gabble] +for the XMPP protocol or [telepathy-rakia] +for the SIP implementation. + +Those Connection Managers use [Telepathy-GLib] +as a framework to implement the Telepathy Specification. The Telepathy-GLib repository has +[a few examples][telepathy-glib-examples] of its usage. + +It is strongly recommend to use Telepathy-GLib when implementing any new +connection manager. The Telepathy-GLib service-side API is only +available in C, but can also be access from other languages that can +embed C, such as C++. This library is [fully documented][telepathy-glib-doc]. + +### Looking inside the telepathy-rakia code + +To start, a small design document can be found at *docs/design.txt* in +telepathy-rakia sources. However, some parts of it are outdated. + +#### Source files + + - *src/telepathy-rakia.c*: this is the starting point of + telepathy-rakia as it instantiates its *ConnectionManager*. + + - *src/sip-connection-manager.\[ch\]*: defines the + *ConnectionManager*Class and requests the creation of a *Protocol* + of type *TpBaseProtocol*. + + - *src/protocol.\[ch\]*: defines the *RakiaProtocolC*lass which + creates the *TpBaseProtocol* object. The protocol is responsible for + starting new *Connections*. The request arrives via D-Bus and + arrives here through Telepathy-GLib. + + - *src/sip-connection.c*: defines the *RakiaConnectionClass* which + inherits from *RakiaBaseConnectionClass*. The latter inherits from + *TpBaseConnectionClass*. + + - *src/sip-connection-helpers.\[ch\]*: helper routines used by + *RakiaConnection* + + - *src/sip-connection-private.h*: private structures for + *RakiaConnection* + + - *src/write-mgr-file.c*: utility to produce manager files + + - *rakia/base-connection.\[ch\]*: base class for + *RakiaConnectionClass*. It implements its parent, + *RakiaBaseConnectionClass* + + - *rakia/base-connection-sofia.\[ch\]*: Implements a callback to + handle events from the SIP stack. + + - *rakia/text-manager.\[ch\]*: defines *RakiaTextManagerClass*, to + manage the *RakiaTextChannel*. + + - *rakia/text-channel.\[ch\]*: defines *RakiaTextChannelClass*. This + is a Telepathy *Channel*. + + - *rakia/media-manager.\[ch\]*: defines *RakiaMediaManagerClass*. + Handles the *RakiaSipSession*. + + - *rakia/sip-session.\[ch\]*: defines *RakiaSipSessionClass*; it + relates directly to the definition of Session in the SIP + specifcation. + + - *rakia/call-channel.\[ch\]*: defines *RakiaCallChannelClass*. The + object is created when an incoming calls arrives or an outgoing call + is placed. A *RakiaCallChannel* belongs to one *RakiaSipSession*. + + - *rakia/sip-media.\[ch\]*: defines *RakiaSipMediaClass*. It is + created immediately after a *RakiaCallChannel* is created. Can + represent audio or video content. + + - *rakia/call-content.\[ch\]*: defines *RakiaCallContentClass*. The + object is created for each new medium added. It relates directly to + the *Content* definition in the Telepathy specification. It could be + an audio or video *Content*, it is matched one-to-one with a + *RakiaSipMedia* object. + + - *rakia/call-stream.\[ch\]*: defines the *RakiaCallStreamClass*. It + could be an audio or video object. The object is created by + *RakiaCallContent*. + + - *rakia/codec-param-formats.\[ch\]*: helper to setting codecs + parameters. + + - *rakia/connection-aliasing.\[ch\]*: defines function for aliasing + *Connection*s. + + - *rakia/debug.\[ch\]*: debug helpers + + - *rakia/event-target.\[ch\]*: helper to listen for events for a NUA + handle (see NUA definition in sofia-sip documentation). + + - *rakia/handles.\[ch\]*: helpers for *Handle*s. + + - *rakia/sofia-decls.h*: some extra declaration + + - *rakia/util.\[ch\]*: utility functions. + +#### sofia-sip + +[sofia-sip] is a User-Agent library that implements the SIP protocol +as described in IETF RFC 3261. It can be used for VoIP, IM, and many +other real-time and person-to-person communication services. +telepathy-rakia makes use of sofia-sip to implement SIP support into +telepathy. sofia-sip has [good documentation][sofia-sip-doc] +on all concepts, events and APIs. + +#### Connection Manager and creating connections + +*src/telepathy-rakia.c* is the starting point of this Telepathy SIP +service. Its *main()* function does some of the initial setup, +including D-Bus and *Logging* and calls Telepathy-GLib's +*tp\_run\_connection\_manager()* method. The callback passed to this +method gets called and constructs a new Telepathy *ConnectionManager* +*GObject*. The Connection Manager Factory is at +*src/sip-connection-manager.c.* + +Once the Connection Manager Object construction is finalized, the +creation of a SIP Protocol Object is triggered inside +*rakia\_connection\_manager\_constructed()* by calling +*rakia\_protocol\_new()*. This function is defined in +*src/protocol.c*. It creates a Protocol Object and adds the necessary +infrastructure that a Connection Manager needs to manage the Protocol. +In the Class Factory it is possible to see which methods are defined +by this Class by looking at the *TpBaseProtocolClass base\_class* var: + +--- +base_class->get_parameters = get_parameters; +base_class->new_connection = new_connection; +base_class->normalize_contact = normalize_contact; +base_class->identify_account = identify_account; +base_class->get_interfaces = get_interfaces; +base_class->get_connection_details = get_connection_details; +base_class->dup_authentication_types = dup_authentication_types; +--- + +Documentation on each method of this class can be found in the +Telepathy-GLib documentation for +[TpBaseConnectionManager] and [TpBaseProtocol]. +The *Protocol* is bound to *ConnectionManager* through the method *tp\_base\_connection\_manager\_add\_protocol()* . + +The *new\_connection()* method defined there is used to create a new +Telepathy *Connection* when the *NewConnection()* method on +*org.freedesktop.Telepathy.ConnectionManager.rakia* is called. + +The Telepathy *Connection* object is of type *RakiaConnection*, which +inherits from *RakiaBaseConnection*, which in turn inherits from +*TpBaseConection*. The methods used by *RakiaConnection* can be seen +at the *RakiaConnectionClass* and *RakiaBaseConnectionClass* +initializations. They are defined at *src/sip-connection.c* for the +*RakiaBaseConnecionClass:* + +--- +sip_class->create_handle = rakia_connection_create_nua_handle; +sip_class->add_auth_handler = +rakia_connection_add_auth_handler; +--- + +and for the *TpBaseConnectionClass*: + +--- +base_class->create_handle_repos = rakia_create_handle_repos; +base_class->get_unique_connection_name = rakia_connection_unique_name; +base_class->create_channel_managers = rakia_connection_create_channel_managers; +base_class->create_channel_factories = NULL; +base_class->disconnected = rakia_connection_disconnected; +base_class->start_connecting = rakia_connection_start_connecting; +base_class->shut_down = rakia_connection_shut_down; +base_class->interfaces_always_present = +interfaces_always_present; +--- + +During the *TpBaseConnection* object construction the +`create_channel_managers` method is called. A *Channel* is an +entity provided by a *Connection* to allow the communication between +the local *ConnectionManager* and the remote server providing the +service. A *Channel* can represent an incoming or outgoing IM message, +a file transfer, a video call, etc. Many *Channel*s can exist at a +given time. + +#### Channels and Calls + +telepathy-rakia has two types of *Channel*s: *Text* and *Call*. For +*TextChannel*s a *RakiaTextManager* objects is created. It inherits +from *TpChannelManager*. *TpChannelManager* is a generic type used by +all types of *Channel*s. See *rakia/text-manager.c* for the +*RakiaTextManagerClass* definitions. When constructed, in +*rakia\_text\_manager\_constructed(),* the object sets the +*connection\_status\_changed\_cb* callback to get notified about +*Connection* status changes. If the *Connection* status changes to +*Connected*, the callback is activated and the code sets yet another +callback, *rakia\_nua\_i\_message\_cb*. This callback is connected to +nua-event from sofia-sip. This callback is responsible for managing an +incoming message request from the remote server. + +The callback then handles the message it receives through the +*Connection* using the sofia-sip library. At the end of the function +the following code can be found: + +--- +channel = rakia_text_manager_lookup_channel (fac, handle); +if (!channel) + channel = rakia_text_manager_new_channel (fac, handle, handle, NULL); +rakia_text_channel_receive (channel, sip, handle, text, len); +--- + +The RakiaTextManager tries to figure if an existing *Channel* for this +message already exists, or if a new one needs to be created. Once the +channel is found or created, RakiaTextManager is notified of the +received message through *rakia\_text\_channel\_receive()* which +creates a *TpMessage* to wrap the received message. + +A similar process happens with the similar *RakiaMediaManager* which +handles SIP *Session*s and *Call* *Channel*s. The callback registered +by *RakiaMediaManager* is *rakia\_nua\_i\_invite\_cb()*, in +*rakia/media-manager.c*, it then can get notified of incoming invites +to create a SIP *Session*. Once the callback is activated, which means +when an incoming request to create a SIP *Session* arrives, a new +*RakiaSipSession* is created. Outgoing requests to create a SIP +session *RakiaSipSession* are initiated on the telepathy-rakia side +through the exposed D-Bus interface. The request comes from the +*TpChannelManager* object and is created by +*rakia\_media\_manager\_requestotron()* in the end of its call chain: + +--- +static void +channel_manager_iface_init (gpointer g_iface, gpointer iface_data) +{ + TpChannelManagerIface *iface = g_iface; + iface->foreach_channel = rakia_media_manager_foreach_channel; + iface->type_foreach_channel_class = rakia_media_manager_type_foreach_channel_class; + iface->request_channel = rakia_media_manager_request_channel; + iface->create_channel = rakia_media_manager_create_channel; + iface->ensure_channel = rakia_media_manager_ensure_channel; +} +--- + +Here in *channel\_manager\_iface\_init()*, telepathy-rakia sets which +method it wants to be called when the [D-Bus methods][telepathy-glib-dbus] +exposed by Telepathy-GLib are called. These functions handle *Channel* creation; +however, they must first create a SIP *Session* before creating the +*Channel* itself. The *RakiaSipSession* object will handle the +*Channel*s between the remote server and telepathy-rakia. + +In the incoming path besides of creating a new SIP session the +*rakia\_nua\_i\_invite\_cb* callback also sets a new callback +*incoming\_call\_cb*, that as it name says get called when a new call +arrives. + +*CallChannel*s, implemented as *RakiaCallChannel* in telepathy-rakia, +are then created once this callback is activated or, for outgoing call +channels requests, just after the *RakiaSipSession* is created. See +the calls to *new\_call\_channel()* inside *rakia/media-manager.c* for +more details. + +If *RakiaCallChannel* constructed was requested by the local user up +two new media streams would be created and added to it; the media can +be audio or video. The media streams, known as a *RakiaSipMedia* +object, is either created by the *CallChannel* constructed method if +[InitialAudio] or [InitialVideo] +is passed or by a later call to *AddContent()* on the D-Bus interface +*org.freedesktop.Telepathy.Channel.Type.Call1*. + +The creation of a *Content* object adds a “*m=*†line in the SDP in +the SIP message body. Refer to the RFC 3261 specification. + +The last important concept is a *CallStream*, implemented here as +*RakiaCallStream*. A *CallStream* represents either a video or an +audio stream to one specific remote participant, and is created +through *rakia\_call\_content\_add\_stream()* every time a new +*Content* object is created. In telepathy-rakia each *Content* object +only has only one *Stream* because only one-to-one calls are supported +. + +## Writing new Folks backends + +The [Folks documentation] on backends is fairly extensive and can +help quite a lot when writing a new backend. Each backend should provide +a subclass of [Folks.Backend]. + +The same documentation can be found in the sources in the file *folks/backend.vala*. The +evolution-data-server (EDS) backend will be used as example here due it +is extensive documentation. The EDS subclass for *Folks.Backend* is +defined in *backend/eds/eds-backend.vala* in the sources. + +A backend also needs to implement the [Folks.Persona] and [Folks.PersonaStore] +subclassess. For EDS those are [Edsf.Persona] and [Edsf.PersonaStore], +which can also be seen in the sources in *backends/eds/lib/edsf-persona.vala* and +*backends/eds/lib/edsf-persona-store.vala*, respectively. + +*Persona* is the representation of a single contact in a given backend, +they are stored by a *PersonaStore.* One backend may have many +*PersonaStores* if they happen to have different sources of contacts. +For instance, each EDS address book would have an associated +*PersonaStore* to it. *Personas* from different *Backends* that +represent the same physical person are aggregated together by Folks core +as a [Individual]. + +The Telepathy backend also serves as a good example. As the EDS backend, +it is well-implemented and documented. + +[oFono community]: https://ofono.org/community + +[telepathy-dbus]: http://telepathy.freedesktop.org/spec/ + +[telepathy-handle]: http://telepathy.freedesktop.org/doc/book/sect.basics.handles.html + +[Telepathy Developer's Manual]: http://telepathy.freedesktop.org/doc/book/ + +[telepathy-gabble]: http://cgit.freedesktop.org/telepathy/telepathy-gabble/ + +[telepathy-rakia]: http://cgit.freedesktop.org/telepathy/telepathy-rakia/ + +[Telepathy-GLib]: http://cgit.freedesktop.org/telepathy/telepathy-glib/ + +[telepathy-glib-examples]: http://cgit.freedesktop.org/telepathy/telepathy-glib/tree/examples/README + +[telepathy-glib-doc]: http://telepathy.freedesktop.org/doc/telepathy-glib/ + +[sofia-sip]: http://sofia-sip.sourceforge.net/ + +[sofia-sip-doc]: http://sofia-sip.sourceforge.net/refdocs/nua/ + +[TpBaseConnectionManager]: http://telepathy.freedesktop.org/doc/telepathy-glib/TpBaseConnectionManager.html + +[TpBaseProtocol]: http://telepathy.freedesktop.org/doc/telepathy-glib/telepathy-glib-base-protocol.html + +[telepathy-glib-dbus]: http://telepathy.freedesktop.org/spec/Connection_Interface_Requests.html + +[InitialVideo]: http://telepathy.freedesktop.org/spec/Channel_Type_Call.html#Property:InitialVideo + +[InitialAudio]: http://telepathy.freedesktop.org/spec/Channel_Type_Call.html#Property:InitialAudio + +[Folks documentation]: https://wiki.gnome.org/Folks + +[Folks.Backend]: http://telepathy.freedesktop.org/doc/folks/vala/Folks.Backend.html + +[Folks.Persona]: http://telepathy.freedesktop.org/doc/folks/vala/Folks.Persona.html + +[Folks.PersonaStore]: http://telepathy.freedesktop.org/doc/folks/vala/Folks.PersonaStore.html + +[Edsf.Persona]: http://telepathy.freedesktop.org/doc/folks-eds/vala/Edsf.Persona.html + +[Edsf.PersonaStore]: http://telepathy.freedesktop.org/doc/folks-eds/vala/Edsf.PersonaStore.html + +[Individual]: http://telepathy.freedesktop.org/doc/folks/vala/Folks.Individual.html diff --git a/content/designs/connectivity.md b/content/designs/connectivity.md new file mode 100644 index 0000000000000000000000000000000000000000..f40025b64beb0f90816088326d9298fa66dbe721 --- /dev/null +++ b/content/designs/connectivity.md @@ -0,0 +1,791 @@ +--- +title: Connectivity +short-description: Connectivity management in Apertis (implemented) +authors: + - name: Gustavo Noronha Silva +--- + +# Connectivity + +## Introduction + +Network management is the task of managing the access to networks. In +other words, deciding when and through which means to connect to the +internet. In an IVI context this task is affected by several conflicting +requirements. Connectivity may be spotty at times, with tunnels, high +speed causing WiFi networks to come and go quickly, low cell phone +signal strength, and so on. On the other hand, potentially good +connectivity while parked, since the user might have high quality WiFi +at the office and at home. Network Management will be discussed in +[][Network management]. + +Online and cellular-based real-time communications, including chatting, +voice calls, VoIP and video calls are covered in +[][Real-time communications]. + +It is very common these days to have people carrying one or more smart +devices with them. People want those smart devices to connect to their +in-vehicle infotainment system for playing audio, importing contacts and +also use or share Internet connections. This is discussed in +[][Tethering from mobile devices]. + +The main medium used for inter-device communication, Bluetooth, and its +various profiles are discussed in [][Bluetooth support]. A brief +discussion of using GPS to enhance network management and about the +GeoClue framework are the subject of [][Global Positioning System (GPS)]. + +Contacts management is covered by a separate document. Integration with +other devices by means other than Bluetooth and USB mass-storage, such +as reading songs off of an iPod is the topic discussed in +[][Media downloading]. + +## Network management + +The main goals of network management in an IVI system are to make sure +the best connection is being used at all times while providing enough +information to applications so that they can apply reasonable policies. +For example, the IVI system should be able to fall-back to a metered 3G +connection when an active WiFi connection is lost (because, say, the +user drives their car out of their garage). In addition, big downloads +should be paused in such a case; these would only be resumed when on an +unmetered connection, to avoid significant charges on the user's phone +bill. + +[ConnMan] is the central piece of the network management system being +considered for Apertis. It is focused on mobile use cases, provides good +flexibility and features that allow implementation of the use cases +mentioned above. [oFono] is the de-facto standard when it comes to +cellular connections and related features, and it is able to work in +cooperation with ConnMan. + +To complete the functionality expected from a modern network management +framework, Bluetooth integration is also important. [BlueZ] is used +to provide that integration, allowing ConnMan to use Bluetooth devices +to go online. Illustration has a schematic view of the interactions +among these frameworks. + + + +### Switching to a different connection + +There are very specific requirements about how and when the system +should switch from an active Internet connection to a newly-available +one. ConnMan developers are working in a infrastructure for policy +plugins. Collabora believes this new infrastructure can be used to +implement the policies required to satisfy the requirements. + +The two main concerns are that the system should not switch to a WiFi +network that just became available, since that may just be an open +network in a café the car is passing by, but that it should also take +advantage of good known connections when they are available. + +ConnMan provides several facilities to gather information useful for +such policy decisions. For instance, a network that has been manually +selected by the user will have the Favorite property set to true. + +That can be used to implement a policy of never automatically migrating +to open WiFi networks that are detected, unless it has been successfully +used before. This would guarantee that the system is able to switch +automatically to relevant networks without running into the problem of +trying to associate with the every open network it passes by. + +Because connections may be lost or replaced by a better connection by +ConnMan, applications need to be aware that their session may go away at +any time, and be able to recover from that. When a connection change +happens, ConnMan will emit a D-Bus signal, and applications may need to +drop connections they have started and restart whatever they were doing. + +A concrete example would be the email application that is connected to +an IMAPx server; when a connection change happens, the application gets +notified the connection it was using has gone away, so it drops all +connections it had with the IMAPx server. If the new connection +satisfies the requirements specified by the email client on its ConnMan +session, it gets a “now online†notification and reconnects to the +server. + +### Application requirements and expressing preferences + +One very important characteristic of Apertis is that its Internet +connectivity will vary a lot – going from a high speed WiFi network to a +slow, unreliable, metered GSM connection and back is a common scenario, +as discussed in the previous section. It may also be required that a +particular type of connection be established to accommodate the needs of +some applications. + +ConnMan has a feature called Sessions. What that feature provides is a +way for applications to tell ConnMan what they expect from an Internet +connection, and get from ConnMan a network connection status that is +relative to those requirements. + +For instance, an application that downloads podcasts may have a policy +that it would only perform the downloads when on WiFi. This application +would create a session with ConnMan, and specify settings that ConnMan +uses to decide whether that session is to be considered online or not. + +The main settings are the **AllowedBearers** and **ConnectionType**. The +first of these specifies which kinds of connections are allowed for the +type of traffic this application intends to do. It is a simple list that +specifies the preferred connection methods, such as, **\[cellular, wifi, +bluetooth\]**, which would specify a preference of cellular connection +over both WiFi and Bluetooth . + +A special “**\*â€** bearer can be used to specify all bearers, which +makes it easy to specify preference for one over all others, which will +be treated as equivalent. When one of the connections allowed for an +application comes online, the sessions is declared to be online. When a +change happens on a Session setting ConnMan updates the application with +the new values for the changed settings. + +The second, **ConnectionType**, is used by the application to tell +ConnMan if a connection wants to be online or if local connectivity is +enough. Local connectivity means only connectivity in the internal +network is needed, for example, an application may want to exchange data +with other devices inside the same network. There is not much use for +this setting in Apertis. + +An application can have more than one ConnMan session at the same time, +allowing applications to specify multiple policies and preferences, and +perform work according to what is actually available. In addition to +these three settings discussed here, ConnMan also provides several +settings that can be used to customize how both sessions and the system +deal with [networking][connman-networking]. + +Note that the Session API is still in a experimental state and it +implementation and API are changing rapidly. This means both that it +cannot be considered a stable part of the API supported by Apertis and +that very few existing applications use it's current form. This should +not be a problem for Apertis since applications are not intended to use +ConnMan directly, so a wrapper API can be specified for the SDK. + +### Binding to the appropriate network interface + +ConnMan allows multiple connections to exist at the same time. This +might be useful for various reasons but it also brings some +complications with it. First of all, if an application wants to use a +specific connection it needs to explicitly bind its network usage to the +desired network interface. + +However, binding to a specific interface requires the +[NET_RAW capability] that is not something that regular applications should +be allowed to have. A possible solution would be to also delegate this +binding to special application that has the privileges to do such +binding. The viability of such a solution needs to be properly +investigated during the initial development of the feature. + +Also, keeping in mind the desire to take complexity and control away +from applications it seems desirable to abstract this complexity away to +the SDK. The SDK can provide APIs that wrap ConnMan functionality and +handle binding for the application. This means more Apertis-specific +code, however, meaning less code reuse for existing applications. + +### Connections policies and store applications manifests + +As discussed above, some control can be exerted on how ConnMan ranks and +chooses connections by having applications (or a system service on their +behalf) provide ConnMan with a list of their requirements using the +Session APIs. + +It has been made clear that applications from the store should be +specifying their needs as much as possible through the manifest file +that will be distributed along with applications on the app store. For +network management this means specifying the allowed bearers, mainly. + +The policy plugin mentioned above could use information provided by the +application manager and application manifest files to decide on what the +best policy to implement is. This would not require changing +applications, but limits the flexibility the developer has to work with. + +### Network-related events and how applications need to behave + +For applications that are written to work with ConnMan, two signals are +essential: connection is up, connection is down. When a connection comes +up the application takes the appropriate steps to start whatever its +functionality is. An IMAP mail client would at this point connect to the +IMAP server, and look for new messages, a podcast downloader would look +for new podcasts to start downloading or resume any downloads that had +previously been started, and so on. + +When the connection goes down – even if it's just being switched from +one connection method to another – any existing IP connections would not +work any more, since the IP address will have changed. The application +needs to close any connections. This means an IMAP mail client would +close the sockets it had open with the IMAP server, a podcast downloader +will close the HTTP or FTP connections, and so on. The connections can +be re-established/resumed in case a new notification comes in that the +system is online once more. + +The following is a potential list of applications and events they will +be interested in handling. As will be seen the events an application +needs to handle are essentially limited to having a connection and not +having a connection any more. + +**Email client** + + - Connected event + + - Connect to IMAP server and check for new mail; note that IMAP + connections are usually kept alive to receive notifications of + new email from the server + + - Connect to the SMTP server to send any emails stored in the + outgoing mail box + + - Disconnected event + + - Drop IMAP connections + + - Cancel ongoing loading of email messages + + - Cancel ongoing sending of email messages, making sure they stay + in the outgoing mail box + +**Media player** + + - Connected event + + - In case multiple connections are supported, when a faster + connection appears switch to it of needed. + + - If media is being played and previously disconnected, resume + buffering + + - Disconnected event + + - Drop connections + +**Feed reader** + + - Connected event + + - Begin download of the latest entries + + - If it's a fast connection, begin pre-caching of images and other + big feed attachments + + - Disconnected event + + - Drop connections + +#### Continuing downloads + +The HTTP protocol provides clients and servers with the ability of +picking up a transfer from a given point, so that partially downloaded +content does not need to be re-downloaded in full when a connection is +dropped and reconnected. Details about how the protocol supports partial +downloads can be found in [RFC2616]. + +In summary, when picking up a download the client should send a *Range* +header specifying the bytes it wants to download. If the server supports +continuing downloads and the range is acceptable a ***206 Partial +Content*** response will be sent instead of the usual ***200 OK*** one. +The client can then append the data to the partially downloaded file. If +the server does not reply with a ***206*** response, then the file needs +to be truncated, since the download will be starting from scratch. + +### Connectivity policies on Android + +Connectivity policies on Android are a way more simpler. The Android +system does not implement any per application configuration on how +application should access the internet (wifi, 3g, etc.). + +Also Android does not have any mechanism to notify the applications that +the system is online, the applications just get a notification about a +Network State Change and then they have to figure out by themselves if +the system is online by requesting a “route to hostâ€. + +One of the few network configurations android has is to enable/disable +Wi-Fi, mobile data and roaming allowance globally. Apart from that the +user can also restrict background data usage for each applications in +the Global Settings. + +## Real-time communications + +The Apertis needs to be well-connected with Internet communication +services such as Google Talk and Skype. [Telepathy] +provides a framework for enabling messaging, video and audio calling through many +of the existing services, and more can be supported by developing custom +[connection managers][telepathy-connection-manager]. + +Telepathy provides a D-Bus API which abstracts away the specific +connection managers allowing a UI to seamlessly support various +protocols while only having little or no protocol specific knowledge. +The fact that Telepathy is implemented as several D-Bus services makes +it possible to integrate messaging and voice features throughout the +system. + +A good example of these are the chat, dialler and address book +applications present on the Nokia N900 and [N9 devices], +which use Telepathy to support messaging, GSM and VoIP Calls using one single user +interface, while at the same time providing ways for the user to choose +which protocol they want for a given conversation. + +Existing open-source connection managers support messaging through +Jabber, GTalk, Facebook, Live Messenger, and more. Audio and video calls +are also supported over Jabber, GTalk and SIP. See the [Telepathy wiki] +for more details. Before deciding on shipping any of these, +however, it's important to verify whether any legal issues may arise, +mainly related to trademarks. + +As discussed before, Telepathy is pluggable, enabling a mix of closed +and open-source connection managers to coexist. This enables OEMs to +enable as many third-party services as desired, requiring for the +technical side at most the creation of a new connection manager. + +Collabora has been involved in consultancy projects to integrate various +proprietary backends in the past and is ready to do so again if it is +decided to include support for more protocols. More details about this +will be included in reference produced during the development phase by +the documentation team. + +### Traditional Telephony (GSM, SMS) + +The system shall support making and receiving calls and +sending/receiving text messages through a paired cell phone. Telepathy +has a backend on top of oFono to make calls and send text messages, +which makes it possible to easily have a single, integrated user +interface for both regular phone calls and messages along with those of +online services. + +Telepathy is focused on messaging and calling; as such, it does not +include GSM/UMTS-specific functionality like signal-strength, data +connections, and so on. Those features are accessible through oFono +directly. + +## Tethering from mobile devices + +There are six main ways to hop on to a mobile device's Internet +connection: + + - WiFi connection for devices that support mobile hotspot + + - Using the DUN Bluetooth profile, through oFono + + - Using the PAN Bluetooth profile + + - Using Ethernet over USB + + - Using 3G USB modem + + - Using device-specific proprietary protocols + +A mobile hotspot feature is becoming more common on mobile devices and, +in a way, taking the place once occupied by Bluetooth for tethering. It +is a good connection method because it is very simple to set up. + +The PAN profile is supported by ConnMan through BlueZ and DUN also needs +these two components to works plus a extra one, which is oFono. This +difference is due to the fact that the DUN profiles behaves like a modem +and thus needs oFono to handle it. All interactions between BlueZ and +the two other daemons, ConnMan and oFono, are performed using BlueZ's +D-Bus interface, and as such should not cause problems in case is +planned to replace BlueZ with a proprietary counterpart that implements +the same interfaces. + +Note that connecting a phone to the car for tethering over Bluetooth is +a process that requires user intervention: the user needs to first pair +the two devices. In most systems that support connections over the cell +phone network the user is also asked to choose the plan they acquired +from their provider from a list, which will also need to be done for the +Apertis; only then the connection will be made available through +ConnMan. + +Ethernet over USB is supported by Linux using the usbnet driver. Among +device-specific protocols, Apple devices in particular are important. +Linux includes, since version 2.6.34, the [ipheth] driver, which +enables using Ethernet over the USB connection for Apple devices. In +addition to the driver, pairing of the device by a user-space program is +required. That pairing can be performed either by using the standalone +tool provided at the project's web page or through the tools distributed +by the **libimobiledevice** project, discussed in +[][Media downloading]. + +Collabora believes the three main components discussed here, BlueZ, +oFono and ConnMan are capable of supporting tethering to most mobile +devices. Provided appropriate user interfaces are implemented, ConnMan +is able to provide all requirements regarding having several different +phones in the car, including prioritizing and selecting which one should +be used. + +## Counting bytes and getting information about bandwidth + +ConnMan provides an API called **Counters** that is used for tracking +how much traffic has gone through a given connection, and can be used by +the network connections management UI to inform the user about the +quantity of data that has been transmitted. The counters are +per-connection and are automatically updated with the information by +ConnMan. + +oFono also provides the **CallMeter** API for tracking how much +conversation time is still available for a GSM phone, using data from +the SIM. oFono is able to emit a warning when the limits are close to be +reached. + +For measuring bandwidth there is no convenient API at the moment. +Clients can register a counter and specify an update interval, but +ConnMan advises against using that API for tracking time. A more robust +and correct implementation would be to have applications and services +that care about that information track the RX/TX bytes and run a timer +of their own to estimate how much bandwidth is being used at a given +point in time. + +It is important to note that tracking connection quality taking in +account used bandwidth (and possible other variables as connection +latency and available bandwidth) is not a easy task. Usually those +variables doesn't give enough information to decide which connection has +the better quality. + +## Providing Internet connectivity to other devices + +In case the Apertis has Internet connectivity itself, it should be able +to share it with other devices through either Bluetooth or WiFi. + +ConnMan supports sharing the current Internet connection by using the +WiFi interface in master mode or via Bluetooth PAN profile, becoming an +access point that other devices can connect to. This is done by turning +on tethering mode on WiFi or Bluetooth. + +> See the tethering properties at the bottom of +> <http://git.kernel.org/?p=network/connman/connman.git;a=blob;f=doc/technology-api.txt;h=851d5ea629975c9c82d86c7863aaab2997485c34;hb=HEAD> + +As is the case with other features, this needs proper UI to be created to let the user turn the +tethering on as well as specify the desired SSID and pass-phrase for +WiFi, or to pair the Bluetooth devices. In order for this feature to be +provided, the driver for the wireless chip used in the development board +needs to support the master mode. + +### A web server to provide information + +Apertis will have a web server running internally to provide information +about the system and the car for access by smart phones. Collabora's +working assumption is the server will be available to devices that +connect to the WiFi or Bluetooth hotspot provided by the system, +regardless of whether it is being used to provide Internet connectivity +to the devices or not. Libwebsockets can be used to write a solution for +web server. + +For users to access the web server, the manual of the device will +contain a specific URI, and the DNS server provided by the device will +resolve the name to the address the system has in the address space used +by its DHCP. + +## Bluetooth support + +The automotive space is by far the biggest user of Bluetooth for +communications between the car and external devices, such as phones, +tablets, notebooks, and so on. Plans have been stated to acquire a +proprietary solution and supplement BlueZ in the apertis. + +That solution sits in between applications that use BlueZ and the BlueZ +daemon, and adapts requests to make sure specific device quirks are +satisfied. + +BlueZ is currently a fairly complete Bluetooth stack, and has support +for all of the major [Bluetooth profiles][bluez-profiles]. It's important to note, +however, that applications need to be written to use the Bluetooth +infrastructure for connecting to mobile devices for music playing, +remote control, file transfer, downloading of contacts and tethering. + +For other Bluetooth profiles, the ones supported by BlueZ will be +provided, development of support for more profiles is out of scope for +this project. The list of Bluetooth profiles bellow was extracted from +the Apertis Feature List document, and information is provided on the +general level of support provided by BlueZ. For a more detailed list of +existing support and gaps, Collabora would require a more detailed list +of requirements. + +When it comes to pairing support BlueZ supports both Legacy Pairing (PIN +entry, many old devices only support this type of pairing) and Secure +Simple Pairing (Numeric Comparison). + +### Bluetooth 3.0, 4.0, High Speed, L2CAP, SDP, RFCOMM + +BlueZ currently supports the both Bluetooth 3.0 and 4.0 core +specification. However, High Speed support through 802.11 AMP is still +under development, so it is not currently supported. The Bluetooth core +support provided by BlueZ includes Logical Link and Control Adaptation +Protocol **(L2CAP),** Service Discovery Protocol **(SDP)** and Radio +Frequency Communications **(RFCOMM).** + +### SPP + +The Serial Port Profile **(SPP)** 1.1 is supported by BlueZ. The Serial +Port Profile allows emulation of serial ports over a Bluetooth link. + +### PAN and DUN + +As discussed in sections [][Network management] and +[][Tethering from mobile devices], +BlueZ provides support for the Personal Area Networking +**(PAN)** profile both in the NAP role (BlueZ acting as connection +provider) or PANU role (BlueZ using a internet connection over +Bluetooth). + +There is support for the DUN profile. The Client role is implemented by +an extra oFono's daemon and can be used to connect to devices providing +internet connection. + +There is also support for the server role of the Dial-up Networking +**(DUN)** profiles, which can be used with oFono and ConnMan to provide +Internet connection to an external device. However current +implementation only supports sharing a DUN connection only if the device +has a GPRS data connection active. Collabora thinks that lack the +support for DUN server won't be a problem. DUN is rapidly being replaces +by the PAN profile. + +### GOEP, OBEX + +The Generic Object Exchange Profile **(GOEP)** and Object Exchange +Protocol **(OBEX)** are also supported by BlueZ. They enable file +exchange between Bluetooth-capable devices and the Apertis system. + +### PBAP, MAP, OPP, SYNCH + +The Phone Book Access Profile **(PBAP)** is supported by BlueZ, and can +be used for downloading contacts from the external devices. There is +also support for Object Push Profile **(OPP)**, used for transferring +vCards and vCalendars. Finally, BlueZ also supports the 1.0 version of +the Message Access Profile **(MAP)** version 1.0 in the client role**,** +which can be used to download SMS messages and email from a phone onto +the Apertis system, however support to upload delete and mark messages +as read/unread is lacking at the moment. + +Note that, although BlueZ includes support for these profiles, it's up +to applications on the system to make use of the framework to provide +the actual features. For instance, the contacts application needs to +talk to BlueZ to perform the phone book download. + +There is currently no support for the Synchronization Profile +**(SYNCH)** profile. Collabora's current understanding is this does not +pose a problem for the use cases planned, since only support for +download of the contacts is required. + +For more information on the specific use-cases, problems and proposed +solutions regarding contacts in the Apertis system refer to the Contacts +design document prepared by Collabora. + +### AVRCP, A2DP, VDP + +These profiles are used to communicate with devices that are able to +reproduce multimedia content and/or control media playing remotely. + +BlueZ supports the Audio/Video Remote Control Profile 1.0 **(AVRCP)** in +the controller role, but it only supports the two commands at the +moment: Volume Down and Volume Up. Collabora recommends the development +of the missing features for AVRCP 1.0 version and also support for the +1.4 version, which is not yet supported by upstream. This would provide +metadata information about the media and folder browsing support. + +When acting as a controller, an application needs to provide the user +with an interface for inputting commands. Collabora will provide sample +code for an application acting on the controller role. + +Also included is support for acting as sink for the Advanced Audio +Distribution Profile **(A2DP)** version 1.2, using PulseAudio to provide +audio routing. When the device starts to send an A2DP stream to Apertis +PulseAudio will automatically make it available as an output device. + +PulseAudio has a module that can automatically redirect streams to new +output devices. However, for systems with complex requirements for audio +routing it's probably a better idea to have a system daemon or +application managing that; the car system interface D-Bus daemon is one +viable candidate. + +The Video Distribution Profile **(VDP)** is not yet supported + +### HFP + +The Hands-Free Profile **(HFP)** version 1.6 is supported by BlueZ. +Hands-free is the technology that allows making phone calls with voice +commands, and having audio routed from the phone to a different device, +such as the Apertis system which can then play it to the car speakers, +for instance. The BlueZ framework, along with oFono, can be used to add +hands-free support to the system. Wide Band Speech – high quality audio +for calls, though, is not yet supported by BlueZ. + +After a SIM-enabled device in Audio Gateway mode has been paired with +BlueZ, PulseAudio will be able to use it as source and sink through its +Bluetooth module and route streams from the car's microphone to the +phone and the audio from the call to the car's speakers. In this case +BlueZ acts as in the Hands-free role. + +The application which handles the calls can use PulseAudio APIs to +control the volume of the source and sink streams, and should set the +**filter.want** property of the [PulseAudio streams] +to let PulseAudio know echo cancellation should be used. + +This will cause PulseAudio to automatically load the echo cancellation +module. The echo cancellation module can also contain a noise +cancellation sub-module. PulseAudio ships with an Open Source sub-module +based on speex for echo cancellation, but it can be replaced by custom +or proprietary modules if required, which was the course chosen by Nokia +for its phones, for instance. The same goes for the noise cancellation +sub-module, it can be easily replaced by an proprietary noise +cancellation solution just by rewriting the sub-module. + +The diagram in Illustration shows how the various pieces of such a +set-up are related. + + + +### HSP + +Currently BlueZ has no support for the Headset Profile **(HSP)**. +Collabora recommends it be supported, but that would require development +of the feature. VoIP applications rely on HSP to operate calls over +Bluetooth, It is also important to mention that older phones only +support the HSP profile for phone calls. + +### GSM 07.07 AT-commands + +oFono has support for most of the GSM 07.07 AT command set. It is +through the AT command set that we control the phone in the HFP profile. + +### GNSS + +The Global Navigation Satellite System Profile is currently not +supported by BlueZ, nonetheless adding support would be simple in the +BlueZ side. For this profile, BlueZ would only provide the Bluetooth +connection handling, all the navigation specific data would be passed to +the GPS specific application to handle it. + +## Global Positioning System (GPS) + +The Apertis platform provided by Collabora will include the GeoClue +geolocation framework. [GeoClue] +provides a D-Bus service that can be queried to establish the current location of the system with +configurable accuracy. GeoClue is able to use the GPS from the system to +provide very accurate location information. + +This technology will be used, for instance, to power the existing +GeoLocation implementation of the WebKit-Clutter library. The service +can be made available for use by store apps, potentially including +selective restrictions on the accuracy each app can query. This should +be discussed and specified during the development phase. + +The connectivity considerations document discusses using GPS data for +predicting connectivity conditions, and pre-emptively switching to a +different connection before entering an area with bad coverage, for +instance. This seems to be a risky strategy unless very up-to-date and +very extensive data are accessible to the system at all times. In any +case, the GeoClue framework could serve the purpose of providing the +location information from the GPS. + +One additional advantage of using GeoClue is it supports using different +providers for its information like cell towers through oFono-based +gsmloc, WiFi networks through integration with ConnMan, IP addresses +through HostIP, and so on. + +Note that HostIP is essentially useless for mobile use cases, since +it tries to use the IP address as an indication of the location, but +that is not very accurate in general and for mobile specifically + +> A somewhat outdated list: <http://www.freedesktop.org/wiki/Software/GeoClue/Providers> + +Those could be useful to provide +location information with coarse accuracy for applications such as the +web browser with no need for turning the GPS on or for systems with no +GPS hardware. If only GPS matters, and tighter control is required, +Collabora can support using the GPS service directly through [gpsd] +or [gypsy]. + +If different accuracy levels want to be defined that the store +application can use, different permissions for GPS access can be created +representing the different levels of accuracy. These permissions would +be specified in the application's manifest and prompted to the user at +the moment the user authorizes the installation of an application. Refer +to the Applications design for more details on this. + +## Media Downloading + +This chapter discusses how communication with various devices is made to +provide the Apertis system with ways of downloading media from them. For +more detailed information on use cases, requirements, problems and +solutions refer to the Media Management design document (sometimes +called the Media and Indexing design) prepared by Collabora. + +In general media will be brought into the system through USB sticks, +mobile devices and online sources. USB sticks are mounted using the USB +mass-storage support. Most USB sticks use the VFAT file system, which +is, unfortunately, patent-encumbered and has been used in the past by +Microsoft to promote law suites against companies shipping devices that +support the file system, see <http://www.groklaw.net/article.php?story=20090401152339514>. + +When considering communication with specific devices the ones that stand +out are the Apple devices, which are both very well-known and have +widespread usage. This document discusses the Open Source tools that are +currently the state of the art for communicating with Apple devices but +does not specifically recommend their usage. + +The [libimobiledevice](http://www.libimobiledevice.org/) suite is the state of the art on Open Source +libraries and tools for accessing Apple devices. It implements the +protocols used for communication with Apple's iPhone, iPad, iPod Touch +and TV products, covering almost all available functionality, including +downloading of music and video, when used in conjunction with libgpod of +the gtkpod project22. Its pairing tool is also a requirement for using +the ipheth driver mentioned before. + +The project is a community effort and although it does not require the +devices to be jail-broken, it's not supported by Apple, which means the +protocol is reverse-engineered and often lags behind recent Apple +releases. As an example, iOS 5 support has only recently (22/03/2012) +seen the light of day in a release. Despite these shortcomings, the +suite would provide the technical means for writing the applications +that interact with Apple products. + +Microsoft has also developed a protocol for media exchange called Media +Transfer Protocol (MTP). This protocol has been standardized and is +currently published by the [USB implementers forum]. +A LGPL-licensed library exists that supports the *Initiator* side of the communication, +meaning it is able to access media on devices that support the MTP +R*esponder* side: [libmtp]. The libmtp library is currently shipped +as a part of Ubuntu and can be provided in the Apertis middleware +platform to be used to implement applications. Note that as is the case +for Apple devices, Collabora is unable to provide legal counselling +about the use of libmtp. + +### UPnP + +Universal Plug and Play ([UPnP]) is a protocol used for discovering +and browsing multimedia content made available by media centers. This +protocol will be supported by the Apertis middleware platform using the +gupnp library. For more information about this please see the Media +Management design document (sometimes referred to as Media/Indexing +design). + +[ConnMan]: http://ConnMan.net/ + +[oFono]: http://oFono.org/ + +[BlueZ]: http://bluez.org/ + +[connman-networking]: http://git.kernel.org/?p=network/connman/connman.git;a=blob;f=doc/session-api.txt;h=e19c6bfe0b6e6fc379cfa3632ac21f338610717d;hb=HEAD) + +[NET_RAW capability]: http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=blob;f=net/core/sock.c;h=b374899aecb6ea3a8590ae9ccdbb3e60225561d4;hb=HEAD#l470 + +[RFC2616]: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35 + +[Telepathy]: http://telepathy.freedesktop.org/wiki/ + +[telepathy-connection-manager]: http://telepathy.freedesktop.org/doc/book/sect.connection.connection-manager.html + +[N9 devices]: http://techprolonged.com/index.php/2011/11/12/nokia-n9-a-complete-walk-through-meego-harmattan-software-and-user-interface-experience/#contacts-calling + +[Telepathy wiki]: http://telepathy.freedesktop.org/wiki/Protocols%20Support + +[ipheth]: http://giagio.com/wiki/moin.cgi/iPhoneEthernetDriver + +[bluez-profiles]: http://www.bluez.org/profiles/ + +[PulseAudio streams]: http://freedesktop.org/software/pulseaudio/doxygen/proplist_8h.html#a87c586045175fa05e28e6ee1cbaac4de + +[GeoClue]: http://www.freedesktop.org/wiki/Software/GeoClue + +[gpsd]: https://savannah.nongnu.org/projects/gpsd + +[gypsy]: http://gypsy.freedesktop.org/ + +[USB implementers forum]: http://www.usb.org/developers/devclass_docs/MTP_1.0.zip + +[libmtp]: http://libmtp.sourceforge.net/ + +[UPnP]: http://www.upnp.org/ diff --git a/content/designs/contacts.md b/content/designs/contacts.md new file mode 100644 index 0000000000000000000000000000000000000000..a5a0cb24946395478f806fd1873cdd10d0b91df5 --- /dev/null +++ b/content/designs/contacts.md @@ -0,0 +1,625 @@ +--- +title: Contacts +short-description: Design for address book contacts in Apertis + (partially-implemented, libraries and services need to be glued into + an integrated solution) +authors: + - name: Travis Reitter +--- + +# Contacts + +## Introduction + +This document outlines our design for address book contacts within the +Apertis system. It describes the many sources the system can draw upon +for the user's contacts, how it will manage those contacts, which +components will be necessary, and what work will be needed in the +middleware space to enable these features. + +Contacts are representations of people that contain some details about +that person. These details are often directly actionable: phone numbers +can be called, street addresses may be used as destinations for the +navigation system. Other details, such as name and avatar are purely +representational. + +We propose a contact system which uses the Folks contact aggregator to +retrieve contacts from multiple sources and automatically match up +contacts which correspond to a single person. This will give a thorough +and coherent view of one's contacts with minimal effort required of the +user. + +## Integrated Address Book Versus Alternative Solutions + +The following design is based around the concept of a heavily-integrated +address book which links together contacts from many contact sources, +providing a common interface for applications to access these contacts. +As presented below, the only available contacts which will not be +fully-integrated into the common contact view will be contacts available +on a paired Bluetooth device. + +The level of contact source integration is flexible. If it is preferred +to limit contact integration to the local address book and +chat/Voice-over-IP contacts to, for example, isolate Facebook or Twitter +contacts in their own address book(s), to be accessed by a special +library, Collabora is ready and able to adjust this design. + +## Contact Sources + +There are many potential sources for contacts, as people's contact +details are frequently split over many services. The proposed system +aggregates contacts from multiple sources in a way that is seamless to +the user. See the [][Components] section on [][Folks] +for more details of the components involved. + +### Local Sources + +New contacts may be created locally by importing contacts from a +[][Bluetooth-paired phone] or a contact +editor dialog (see [][User interfaces]). + +These local contacts may contain a wide variety of detail types, +including (but not limited to): + + - Full name + - Phone numbers + - Street addresses + - Email addresses + - Chat usernames on various services + - User-selected groups + - Notes + +### Bluetooth-paired phone + +#### Synchronization + +Contacts may be simply synchronized to a Apertis system by means of a +[SyncML] contact transfer from a phone paired with the Apertis system +over Bluetooth. This operation is designed to intelligently merge fields +added to contacts on the source phone to avoid creating duplicates. + +To manage complexity, this function will only be supported from a phone +to the Apertis system, not the other way around. Systems which support +two-way contact synchronization have a number of issues to contend with, +including: + + - Contacts do not contain “last modified†time stamps, so it is + rarely obvious how to resolve conflicts + - “Fuzzy-matching†fields for cases of equivalent names or phone + numbers is not consistently implemented across different systems + (if it is implemented at all) + - Even if equivalent fields are correctly matched, it is not clear + which version should be preferred + - Because conflict resolution may not be symmetrical between the + two directions of synchronization, the contacts in the two + systems may never reach a stable state, potentially causing + other side effects (such as duplicates on the phone) + +By limiting synchronization from the phone to the Apertis instance (with +a “source wins†conflict resolution policy), we can avoid the +aforementioned issues and more. This simpler scheme will also be easier +for users to understand, improving the user exeperience. + +Synchronization will be performed automatically each connection of a +phone to the Apertis system. + +Each phone device will receive its own contact address book on the +Apertis system which will be created upon first connection and re-used +upon subsequent connections. This is meant to make it trivial to remove +old address books based upon storage requirements. + +### Chat and Voice-over-IP Services + +Most chat and some Voice-over-IP (VoIP) services maintain contact lists, +so these are another potential source of contacts. We recommend +supporting contacts from audio- and video-capable services, such as +Session Initiation Protocol (SIP), Google Talk, and XMPP. These contacts +and their services provide an alternative type of audio call which users +may occasionally prefer to mobile phone calls for purposes of call +quality and billing. + +Additionally, contacts on some of these services may provide extended +information, such as a street address, which the user might not +otherwise have in their address book. + +Our system will cache these contacts and their avatars from the service +contact list. This will allow Apertis applications to always display +these contacts. When the user attempts to call a chat/VoIP contact while +offline, the system may prompt the user to go online and connect that +account to complete the action. + +From a user perspective, the configuration of chat and VoIP accounts +within Apertis would be simple. In most cases, just providing a username +and password will add that user's tens or hundreds of service contacts +to the local address book. For limited effort, this can significantly +increase the ways the user can reach their acquaintances in the future. + +### Web services + +The growing number of web services with social networking is yet another +source of contacts for many users. Some services may provide useful +contact information, such as postal addresses or phone numbers. In these +cases, it may be worthwhile to include web service contacts (since +implementation for some services already exist within [][Folks] +and [][libsocialweb]. + +In the case of multi-seat configurations, it may also be worthwhile to +support additional web services for entertainment purposes. Potential +uses include playback of contacts' YouTube videos, reading through +contacts' Facebook status updates, Twitter tweets, and other use cases +which do not apply to a driver due to their attention requirements. + +In general, web services require third parties access their content +through a specially-issued developer key. In many cases, this will +require to secure license agreements with the provider to guarantee +reliable service as their terms of service change frequently (usually +toward less access). + +Our system will cache these contacts and their avatars from the service +contact list. This will allow Apertis applications to always display +these contacts, even when offline. + +### SIM Card + +Contacts may be retrieved from a SIM card within a vehicle's built-in +mobile phone stack. These contacts will be accessible from the Apertis +contacts system. However, any changes to these contacts will not be +written back to the SIM card. See [][Read-only operation for external sources]. + +### Read-only Operation for External Sources + +Modifications of contacts will be limited to [][Local sources]. +Depending upon the user interfaces created, users +may be able to set details upon local contacts which may appear to +affect external contacts such as web service contacts or +Bluetooth-connected phone contacts. However, these changes will not +actually be written to the corresponding contact on the external source. + +## Standard Behavior and Operations + +### Contact Management + +Our proposed system will support adding, editing, and removing contacts. +New contacts will be added to [][Local sources]. Though the [][Components] +which will enable contact management already +support these features, [][User interfaces] +needs to be implemented to present these functions +to the user. Similarly, contacts will need to be presented as necessary +by end-user applications. + +### Contact Aggregation and Linking + +Contacts will be automatically aggregated into “meta-contacts†which +contain the sum details amongst all sub-contacts. The criteria for +matching up contacts will be: + + - **Equivalent identifier fields** – for instance, two contacts + with the email address <bob@example.com> or phone numbers + “+18001234567†and “1-800-123-4567†+ - **Similar name fields** – for instance, contacts with the full + names “Robert Doeâ€, “Rob Doeâ€, and “Bob Doe†(which all contain + variations of the same given name) + +This system will be careful to avoid matching upon unverified fields +which would allow a remote contact to spoof their identity for the +purpose of being matched with another contact. In a real-world example, +Facebook contacts may claim to own any chat name (even those which +belong to other people). If we automatically matched upon this field, +they could, theoretically, initiate a phone call and appear to the user +as that other person. + +The user will also be able to manually “link†together any contacts or, +similarly, manually “anti-link†any contacts which are accidentally +mismatched through the automatic process. + +Linking and anti-linking will be reversible operations. This will avoid +a user experience issue found in some contact aggregation systems, such +as the one used on the Nokia N900. + +### Local Address Book Management + +The Apertis contacts system will support adding and removing local +contact stores in an abstract way that does not assume prior knowledge +of the underlying address book store. In other words, to add or remove +an underlying Evolution Data Server contact database, a client +application will be able to use functionality within Folks and, indeed, +not even need to know how the contacts are stored. + +### Search + +This contact system will include the ability to search for contacts by +text. Search results will be drawn from all available contact sources +and will support support “fuzzy†matching where appropriate. For +instance, a search for the phone number “(555) 123-4567†will return a +contact with the phone number “+15551234567†and a search for the name +“Rob†will match a contact named “Robert.†+ +Each type of contact detail field supports checking for both equality +(for example, “Alice†≠“Carolâ€) and equivalence (for example, the phone +number “(555) 456 7890†is equivalent to “4567890â€). This allows the +contact system to add or change fuzzy matching for fields without +needing to break API or treat certain field details specially based upon +their names. + +#### Sorting and Pagination + +As a convenience for applications and potentially an optimization, the +contacts system will support returning search results in sorted order +(for example, by first name). + +Furthermore, the search system will support returning a limited number +of results at a time (“paginating†the result set). This may improve +performance for user interfaces which only require a small number of +results at once. + +### Event Logging + +Related to the contacts system, Collabora will provide an event logging +which logs simple, direct communication between the user and their +contacts. Supported events include VoIP and standard mobile phone calls, +SMS messages, and chat conversations. + +Events will include at least the following fields: + + - **User Account ID** – e.g., “+15551234567â€, + “<alice@example.jabber.org>†+ - **Contact service ID** – the unique ID of the contact involved + - **Direction** – sent or received + - **Event type** – call, text message + - Timestamp + - **Message content** – for text messages of any type + - **Success** – whether the call successfully connected, whether a + text message was successfully sent + +The contact service ID can be used by applications to look up extended +information from the contacts system, such as full names and avatars. +These details can then be displayed within the application to provide a +consistent view of contacts when displaying their conversations. + +#### Out of Scope + +Email conversations will be out of scope due to their relatively large +message sizes and their common use for indirect conversations (such as +mailing list messages, advertisements or promotions, social networking +status updates, and so on). + +Messages exchanges with web service contacts will not be supported by +default. However, the event logging service will allow third-party +software to add events to the database. So events not logged by default +by the middleware may be added by entirely third-party applications. + +### Caching + +In general, contact sources will be responsible for maintaining their +own cache in a way that is transparent to client applications. + +#### Opportunistic Caching + +It may be best to defer bandwidth-intensive operations (such as full +contact list and avatar downloads) until the Apertis system can connect +to an accessible WiFi network (such as the user's home or work network). + +#### Open Questions + +Will there be a general framework for libraries and applications to +check whether network data should be considered “cheap†or “too +expensiveâ€? And should the contacts system factor that into its network +operations? + +Most bare contact lists (not including avatars) have trivial data +length. For example, my very large Google contacts list of 1,600 +contacts only contains 171 kilobytes of data. Common contact lists are +substantially smaller than that. + +When factoring in avatars (for the first contact list download), contact +list sizes can potentially reach a few megabytes in the worst case. This +could be an unacceptable amount of data to transfer on a pay-as-you-go +data plan. But at the same time, this is a relatively small amount of +data and will only get relatively smaller as data service plans improve. + +Considering the factors above, would it be worthwhile for the contacts +system to support opportunistically caching remote contact lists on +bandwidth-limited networks? + +## Components + +### Folks + +[Folks] is a contact management library (libfolks) and set of +backends for different contact sources. One of Folks' core features is +the ability to aggregate meta-contacts from different contacts (which +may come from multiple backends). These meta-contacts give a high-level +view of people within the address book, making it easy to select the +best method of communication when needed. For instance, the driver could +just as easily call someone by their SIP address as their mobile phone +if they prefer it for call quality or billing reasons. + +The actively-maintained Folks backends include: + + - - **Telepathy** – Chat and audio/video call contacts, including + Google Talk, Facebook, and SIP + - **Evolution Data Server (EDS)** – Local address book contacts + - **libsocialweb** – Web service contacts, including YouTube and + Flickr + +Many of these backends have associated utility libraries which allow +client software to access contact features which are unique to that +service. For instance, the Telepathy backend library provides Telepathy +contacts, which may be used to initiate phone calls. + +#### Bindings + +The Folks libraries have native bindings for both the C/C++ and Vala +programming languages. There is also support for binding any languages +supported by GObject Introspection (including Python, Javascript, and +other languages), though this approach has less real-world testing than +the C/C++ and Vala bindings. + +#### Required work + +As described in [][Contact aggregation and linking], our system will support automatic linking of +contacts as well as anti-linking (for mismatched automatic links). Folks +currently supports recommending links but does not yet act upon these +recommendations automatically, so this would need to be implemented. + +Along with this, Folks will need the ability to mark contacts +specifically as non-matches (by anti-linking them). There is preliminary +code for this feature, but it will need to be completed for this +functionality. + +In order to enable display of chat/VoIP contacts while offline, we will +need to implement a chat/VoIP contact list cache within Folks. This will +be similar to existing code for caching avatars, but simpler. + +Similarly, we will need to implement a web service contact cache to +display web service contacts while offline. + +Search functionality in Folks is nearly complete but still needs to be +merged to [mainline][folks-need-merge]. + +Additionally, the ability to perform “deep†+searches will require support for [search-only backends]. + +The search functionality will also need to support sorting and +pagination as described in [][Sorting and pagination] +before it can be merged upstream. + +Folks external contact sources will need the ability to be designated as +“synchronize-only†or “keep-remoteâ€. Contact sources designated as +synchronize-only will be automatically synchronized as necessary (such +as when a phone is connected over Bluetooth). Keep-remote sources will +not be synchronized to the Apertis system and will only be accessible +while the remote source is available (whether over a local or Internet +connection). + +For Folks to access contacts stored on a vehicle's built-in SIM card, we +will need to write an oFono backend to retrieve the contacts from that +hardware. + +Abstract contact address book creation and deletion within Folks will +require new work. + +In case [][Opportunistic caching] is required for the +contacts system, this will need to be added as a new feature to Folks +and its Telepathy and libsocialweb backends. + +Support for storing arbitrary data in contacts has not yet been +implemented in Folks, but has already been [discussed][folks-data-storage] +and will be implemented. + +#### Out of scope + +We recommend application logic for synchronizing an entire address book +from a Bluetooth-paired phone be implemented in a new library or +application on top of SyncEvolution (which we will provide in our +Reference images). The contacts created in this process will +automatically be stored as any other local contact. + +Speech-based search has been identified as a major use case for the +address book software in Apertis. The text-based search portion of this +use case will be supported by Folks; however, the parsing of audio data +into a text for searching will be the responsibility of specific +software above the middleware. Global search in general will be covered +in the upcoming document “Apertis Global Searchâ€. + +Collabora recommends to implement the voice search in whole or in part +as a service daemon started automatically upon boot. This would allow +dependent functionality, including Folks, to be initialized in advance +of user interaction. This will be necessary to minimize latency between +voice search and the display of results. + +Support for contact caching for abstract third-party backends certainly +would be possible and would likely take the form of a vCard contact +store. However, at this time, Collabora recommends not implementing this +feature. We would much prefer to delay this until there exist at least +two third-party Folks backends with which to test this functionality +during development. This is primarily due to the risks involved with +committing to an API. Once officially released, this API will need to be +kept stable. So it is critical that the API be tested by multiple +independent code bases before finalization. Furthermore, at this time, +there exist no known third-party Folks backends. In the meantime, +third-party backends could still implement opaque contact caches suited +to their own needs and migrate to a centralized implementation if and +when it is created. + +### Telepathy + +The [Telepathy] communications framework, which Collabora created and +maintains, retrieves contacts for many types of chat services, including +Google Talk, Facebook, XMPP, and most other popular chat services. It +also supports supports audio and video calls over SIP, standard mobile +phone services, and the previously-mentioned chat services (depending +upon provider). + +### Evolution Data Server (EDS) + +Evolution Data Server is a service which stores local address book +contacts and can retrieve contacts stored in Google accounts or remote +LDAP contact stores. Contacts may contain all defined and [arbitrary][RFC2426] +[vCard] attributes and parameters, which is a common contact +exchange format in address book systems. This allows Folks to store and +retrieve contacts with many types of details. + +EDS is the official address book store for the Gnome Desktop and has +been used in Nokia's internet tablet devices and N900 mobile phone. It +has been the default storage backend for Folks since Gnome 3.2, which +was released in September, 2011. + +### libsocialweb + +In the case that we support web service contacts, libsocialweb will be +the component that provides these contacts through its Folks backend. +Note that exactly which web services can be used depends upon both +implementation in libsocialweb and license agreements with those +services. See [][Web services] for more details. + +### SyncEvolution + +[SyncEvolution] is a service which supports synchronizing address +books between two sources. While it supports many protocols and storage +services, it best supports synchronizing contacts from a SyncML client +over Bluetooth to Evolution Data Server, which will be our primary +contact store. Many mobile phones support the SyncML protocol as a means +of contact synchronization. + +This method requires Bluetooth [OBEX] data transfer support, which +is widely supported by most Bluetooth stacks, including [BlueZ]. + +### Zeitgeist + +[Zeitgeist] is open source event-tracking software that will serve +as the [][Event logging] service for Apertis. It is a +flexible event store and uses external services to store their events in +a central location. So, by its nature, it supports third-party +applications without prior knowledge of them. + +Zeitgeist is committed to API stability in part because Ubuntu's Unity +user interface depends upon it. + +#### Required Work + +A simple service to monitor and send Telepathy chat and VoIP call events +to Zeitgeist is in progress, so this work will need to be finished and +merged upstream. + +## Architecture + +In our recommended architecture, contacts applications will use libfolks +directly. Libfolks, in turn, will use its Telepathy backend for chat and +VoIP service contacts; Evolution Data Server backend for local contacts, +and its libsocialweb backend for web service contacts. + +Not pictured in is the optional linking between the application and each +backend's utility library (for accessing service-specific contact +features). + +### Accessibility of Contacts By Source + +Contacts within this system are accessible on two levels: Meta-contacts, +representing an entire person, are available for all contacts in the +system. Each meta-contact contains at least one contact. For many use +cases, applications can work entirely with meta-contacts and ignore the +underlying contacts. For use cases requiring service-specific +functionality, such as initiating an audio call with a Telepathy +contact, applications can iterate through a meta-contact's sub-contacts. + +Additionally, applications can access contacts for each user account. +Each account has a corresponding contact store containing only the +contacts for that account. So, an application could be written to +display only contacts from single account or service provider at a time +(ignoring any parent meta-contacts if it instead wishes to work in terms +of service contacts). + +## User interfaces + +As Folks and Telepathy are a set of libraries and low-level services, +they do not provide user interfaces. There exist a few open source, +actively-maintained applications based upon Folks and Telepathy: + + - **Gnome Contacts** – an “address book†application which + supports contact management and searching + - **Empathy** – a chat application which provides a chat-style + contact list and both audio/video call and chat handler programs + +Together, these components provide most contact functionality including: + + - Adding new contacts + - Editing or removing contacts + - Browsing/searching through contacts + - Importing contacts from a Bluetooth-paired phone + - Initiating and accepting incoming phone calls + +However, these applications are designed for use on a typical desktop +environment and do not suit the needs of an in-vehicle infotainment user +experience. We recommend to examine these applications as real-world +examples of contact applications which use the components we recommend +for the Apertis contacts system. + +## Multiple Users + +Each user in the system will have their own contacts database, chat/VoIP +accounts, and web service accounts. Changes by one user will not affect +the contacts or accounts of another user. + +## Storage considerations + +The storage requirements for our proposed contacts system will be very +modest. Storage of local address book contacts should be under a few +megabytes for even large sets of contacts with up to several megabytes +of storage for contacts' avatars. + +These storage requirements do not factor in files received from +contacts. + +## Abstracting Contacts Libraries + +In general, Collabora discourages direct, complete abstractions of +libraries because the resulting library tends to have fewer features, +more bugs, and gives its users less control than the libraries it's +meant to abstract. Particularly, when abstracting two similar libraries, +the resultant library contains the “least common denominator†of the +original libraries' features. + +However, partial-abstraction “utility†libraries which simplify common +use patterns can prove useful for limited domains. For instance, if many +applications required the ability to simply play an audio file without +extended multimedia capabilities, a utility library could dramatically +simplify the API for these applications. + +As such, Collabora recommends against abstracting Folks or Zeitgeist on +a per-component basis as they are designed to be relatively easy to +integrate into applications. But, for example, it would make sense to +create a library or two which provide widgets based upon these +libraries. This could create a contact selector widget based on top of +Folks, allowing applications to prompt the user to pick a contact with +only a small amount of code. + +Another recommended widget to add to such a library is a “type-ahead†+contact selector as is common in many email applications. As the user +types into a “To:†entry field, the widget would the Folks search +capabilities to return a list of suggestions for the user to select +from. + +[SyncML]: http://en.wikipedia.org/wiki/SyncML + +[Folks]: http://telepathy.freedesktop.org/wiki/Folks + +[folks-need-merge]: https://bugzilla.gnome.org/show_bug.cgi?id=646808 + +[search-only backends]: https://bugzilla.gnome.org/show_bug.cgi?id=660299 + +[folks-data-storage]: https://bugzilla.gnome.org/show_bug.cgi?id=641211 + +[Telepathy]: http://telepathy.freedesktop.org/wiki/ + +[RFC2426]: http://www.ietf.org/rfc/rfc2426.txt + +[vCard]: http://en.wikipedia.org/wiki/VCard + +[SyncEvolution]: http://syncevolution.org/ + +[OBEX]: http://en.wikipedia.org/wiki/OBEX + +[BlueZ]: http://www.bluez.org/ + +[Zeitgeist]: http://zeitgeist-project.com/ diff --git a/content/designs/contribution-checklist.md b/content/designs/contribution-checklist.md new file mode 100644 index 0000000000000000000000000000000000000000..91cc93cc8aadcd19cf5ed1775f1812ab5c126abb --- /dev/null +++ b/content/designs/contribution-checklist.md @@ -0,0 +1,253 @@ +# Submission Checklist + +## Summary + +Before submitting a large commit, go through the checklist below to ensure it +meets all the requirements. If submitting a large patchset, submit it in parts, +and ensure that feedback from reviews of the first few parts of the patchset are +applied to subsequent parts. + +## Overall principles + +In order to break sets of reviews down into chunks, there a few key principles +we need to stick to: +* Review patches which are as small as possible, but no smaller (see + [here](https://developer.gnome.org/programming-guidelines/stable/version-control.html.en#guidelines-for-making-commits), + [here](https://crealytics.com/blog/2010/07/09/5-reasons-keeping-git-commits-small/), + and [here](http://who-t.blogspot.co.uk/2009/12/on-commit-messages.html)) +* Learn from each review so the same review comments do not need to be made + more than once +* Use automated tools to eliminate many of the repetitive and time consuming + parts of patch review and rework +* Do high-level API review first, ideally before implementing those APIs, to + avoid unnecessary rework + +## Order of reviewing commits + +Before proposing a large patchset, work out what order the patches should be +submitted in. If multiple new classes are being submitted, they should be +submitted depth-first, as the APIs of the root classes affect the implementation +of everything else. + +The high-level API of any major new feature should be first, then – only once +that high-level API has been reviewed – its implementation. There is no point in +starting to review the implementation before the high-level API, as the +high-level review could suggest some big changes which invalidate a lot of the +implementation review. + +## Revisiting earlier commits + +Rather than trying to get everything comprehensive first time, we should aim to +get everything correct and minimal first time. This is especially important for +base classes. The commit which introduces a new base class should be fairly +minimal, and subsequent commits can add functionality to it as that +functionality becomes needed by new class implementations. + +The aim here is to reduce the amount of initial review needed on base classes, +and to ensure that the non-core parts of the API are motivated by specific needs +in subclasses, rather than being added speculatively. + +## Automated tests + +One of the checklist items requires checking the code coverage of the automated +tests for a class. We are explicitly not requiring that the code coverage +reaches some target value, as the appropriateness of this value would vary +wildly between patches and classes. Instead, we require that the code coverage +report (`lcov` output) is checked for each patch, and the developer thinks about +whether it would be easy to add additional automated tests to increase the +coverage for the code in that patch. + +## Pre-submission checklist + +(A rationale for each of these points is given in the section below to avoid +cluttering this one.) + +Before submitting any patch, please make sure that it passes this checklist, to +avoid the review getting hung up on avoidable issues: +1. All new code follows the + [coding guidelines](https://designs.apertis.org/latest/coding_conventions.html), + especially the + [API design guidelines](https://wiki.apertis.org/Guidelines/API_design), + [namespacing guidelines](https://developer.gnome.org/programming-guidelines/unstable/namespacing.html.en), + [memory management guidelines](https://developer.gnome.org/programming-guidelines/unstable/memory-management.html.en), + [pre- and post-condition +guidelines](https://developer.gnome.org/programming-guidelines/unstable/preconditions.html.en), + and + [introspection +guidelines](https://developer.gnome.org/programming-guidelines/unstable/introspection.html.en) + — some key points from these are pulled out below, but these are not the + only points to pay attention to. +1. All new public API must be + [namespaced correctly](https://developer.gnome.org/programming-guidelines/unstable/namespacing.html.en). +1. All new public API must have a complete and useful + [documentation comment](https://wiki.apertis.org/Guidelines/API_documentation) + (ignore the build system comments on that page – we use hotdoc now – the + guidelines about the comments themselves are all still relevant). +1. All new public API documentation comments must have + [GObject Introspection annotations](https://wiki.gnome.org/Projects/GObjectIntrospection/Annotations) + where appropriate; `g-ir-scanner` (part of the build process) must emit no + warnings when run with `--warn-all --warn-error` (which should be set by + `$(WARN_SCANNERFLAGS)` from `AX_COMPILER_FLAGS`). +1. All new public methods must have + [pre- and post-conditions](https://developer.gnome.org/programming-guidelines/unstable/preconditions.html.en) + to enforce constraints on the accepted parameter values. +1. The code must compile without warnings, after ensuring that + `AX_COMPILER_FLAGS` is used *and enabled* in `configure.ac` (if it is + correctly enabled, compiling liblightwood should fail if there are any + compiler warnings) — remember to add `$(WARN_CFLAGS)`, `$(WARN_LDFLAGS)` + and `$(WARN_SCANNERFLAGS)` to new `Makefile.am` targets as appropriate. +1. The introduction documentation comment for each new class must give a usage + example for that class in each of the main modes it can be used (for + example, if done for the roller, there would be one example for fixed mode, + one for variable mode, one for linked rollers, one for each animation mode, + etc.). +1. All new code must be formatted as per the + [coding guidelines](https://designs.apertis.org/latest/coding_conventions.html#code-formatting), + using + [`clang-format`](https://designs.apertis.org/latest/coding_conventions.html#reformatting-code) + not GNU `indent`. +1. There must be an example program for each new class, which can be used to + manually test all of the class’s main modes of operation (for example, if + done for the roller, there would be one example for fixed mode, one for + variable mode, one for linked rollers, one for each animation mode, etc.) — + these examples may be submitted in a separate patch from the class + implementation, but must be submitted at the same time as the implementation + in order to allow review in parallel. Example programs must be usable when + installed or uninstalled, so they can be used during development and on + production machines. +1. There must be automated tests (using the + [`GTest` framework in GLib](https://developer.gnome.org/glib/stable/glib-Testing.html)) + for construction of each new class, and for getting and setting each of its + properties. +1. The code coverage of the automated tests must be checked (using + `make check-code-coverage`; see D3673) before submission, and if it’s + possible to add more automated tests (and for them to be reliable) to + improve the coverage, this should be done; the final code coverage figure + for the class should be mentioned in a comment on the diff, and it would be + helpful to have the `lcov` reports for the class saved somewhere for + analysis as part of the review. +1. There must be no definite memory leaks reported by Valgrind when running the + automated tests under it (using `AX_VALGRIND_CHECK` and + `make check-valgrind`; see D3673). +1. All automated tests must be installed as + [installed-tests](https://wiki.gnome.org/Initiatives/GnomeGoals/InstalledTests) + and must be + [run when liblightwood is built into a package](https://git.apertis.org/cgit/rhosydd.git/tree/debian/tests/gnome-desktop-testing) + (we can help with the initial setup of this infrastructure if needed). +1. `build-snapshot -Is $build_machine` must pass before submission of any patch + (where `$build_machine` is a machine running an up-to-date copy of Apertis, + which may be `localhost` — this is a standard usage of `build-snapshot`). +1. All new code has been checked to ensure it doesn’t contradict review + comments from previous reviews of other classes (i.e. we want to avoid + making the same review comments on every submitted class). +1. Commit messages must + [explain *why* they make the changes they do](http://chris.beams.io/posts/git-commit/). +1. The dependency information between Phabricator diffs must be checked to be + in the correct order after submitting diffs. +1. All changes are documented including wiki and appdev portal +1. Grammar and spelling should be checked with an automated tool for typos and + mistakes (where appropriate) + +## Rationales + +1. Each coding guideline has its own rationale for why it’s useful, and many of + them significantly affect the structure of a diff, so are important to get + right early on. +1. Namespacing is important for the correct functioning of a lot of the + developer tools (for example, GObject Introspection), and to avoid symbol + collisions between libraries — checking it is a very mechanical process + which it is best to not have to spend review time on. +1. Documentation comments are useful to both the reviewer and to end users of + the API — for the reviewer, they act as an explanation of why a particular + API is necessary, how it is meant to be used, and can provide insight into + implementation choices. These are questions which the reviewer would + otherwise have to ask in the review, so writing them up lucidly in a + documentation comment saves time in the long run. +1. GObject Introspection annotations are a requirement for the platform’s + language bindings (to JavaScript, for example) to work, so must be added at + some point. Fixing the error messages from `g-ir-scanner` is sufficient to + ensure that the API can be introspected. +1. Pre- and post-conditions are a form of assertion in the code, which check + for programmer errors at runtime. If they are used consistently throughout + the code on every API entry point, they can catch programmer errors much + nearer their origin than otherwise, which speeds up debugging both during + development of the library, and when end users are using the public APIs. + They also act as a loose form of documentation of what each API will allow + as its inputs and outputs, which helps review (see the comments about + documentation above). +1. The set of compiler warnings enabled by `AX_COMPILER_FLAGS` have been chosen + to balance + [false positives against false negatives](https://en.wikipedia.org/wiki/Type_I_and_type_II_errors) + in detecting bugs in the code. Each compiler warning typically identifies a + single bug in the code which would otherwise have to be fixed later in the + life of the library — fixing bugs later is always more expensive in terms of + debugging time. +1. Usage examples are another form of documentation (as discussed above), which + specifically make it clearer to a reviewer how a particular class is + intended to be used. In writing usage examples, the author of a patch can + often notice awkwardness in their API design, which can then be fixed before + review — this is faster than them being caught in review and sent back for + modification. +1. Well formatted code is a lot easier to read and review than poorly formatted + code. It allows the reviewer to think about the function of the code they + are reviewing, rather than (for example) which function call a given + argument actually applies to, or which block of code a statement is actually + part of. +1. Example programs are a loose form of testing, and also act as usage examples + and documentation for the class (see above). They provide an easy way for + the reviewer to run the class and (for example, if it is a widget) review + its visual appearance and interactivity, which is very hard to do by simply + looking at the code in a patch. Their biggest benefit will be when the class + is modified in future — the example programs can be used to test changes to + it and ensure that its behavior changes (or does not) as expected. + Availability of example programs which covered each of the modes of using + `LightwoodRoller` would have made it easier to test changes to the roller in + the last two releases, and discover that they broke some modes of operation + (like coupling two rollers). +1. For each unit test for a piece of code, the behavior checked by that unit + test can be guaranteed to be unchanged across modifications to the code in + future. This prevents regressions (especially as the unit tests for Apertis + projects are set up to be run automatically on each commit by + @apertis-qa-bot, which is more frequently than in other projects). The value + of unit tests when initially implementing a class is in the way they guide + API design to be testable in the first place. It is often the case that an + API will be written without unit tests, and later someone will try to add + unit tests and find that the API is untestable; typically because it relies + on internal state which the test harness cannot affect. By that point, the + API is stable and cannot be changed to allow testing. +1. Looking at code coverage reports is a good way to check that unit tests are + actually checking what they are expected to check about the code. Code + coverage provides a simple, coarse-grained metric of code quality — the + quality of untested code is unknown. +1. Every memory leak is a bug, and hence needs to be fixed at some point. + Checking for memory leaks in a code review is a very mechanical, + time-consuming process. If memory leaks can be detected automatically, by + using `valgrind` on the unit tests, this reduces the amount of time needed + to catch them during review. This is an area where higher code coverage + provides immediate benefits. Another way to avoid leaks is to use + [`g_autoptr()`](https://developer.gnome.org/glib/stable/glib-Miscellaneous-Macros.html#g-autoptr) + to automatically free memory when leaving a control block — however, as this + is a completely new technique to learn, we are not mandating its use yet. + You might find it easier though. +1. If all automated tests are run at package build time, they will be run by + @apertis-qa-bot for every patch submission; and can also be run as part of + the system-wide integration tests, to check that liblightwood behavior + doesn’t change when other system libraries (for example, Clutter or + libthornbury) are changed. This is one of the + [motivations behind installed-tests](https://wiki.gnome.org/Initiatives/GnomeGoals/InstalledTests#Issues_with_.22make_check.22). + This is a one-time setup needed for liblightwood, and once it’s set up, does + not need to be done for each commit. +1. `build-snapshot` ensures that a Debian package can be built successfully + from the code, which also entails running all the unit tests, and checking + that examples compile. This is the canonical way to ensure that liblightwood + remains deliverable as a Debian package, which is important, as the + deliverable for Apertis is essentially a bunch of Debian packages. +1. If each patch is updated to learn from the results of previous patch + reviews, the amount of time spent making and explaining repeated patch + review comments should be significantly reduced, which saves everyone’s + time. +1. Commit messages are a form of documentation of the changes being made to a + project. They should explain the motivation behind the changes, and clarify + any design decisions which the author thinks the reviewer might question. If + a commit message is inadequate, the reviewer is going to ask questions in + the review which could have been avoided otherwise. diff --git a/content/designs/contribution-process.md b/content/designs/contribution-process.md new file mode 100644 index 0000000000000000000000000000000000000000..f804937ecaf9a96ae1bcf732c8b3a231a582189e --- /dev/null +++ b/content/designs/contribution-process.md @@ -0,0 +1,400 @@ +# Contribution process + +This guide covers the expectations and processes for Apertis developers wishing +to make contributions to the Apertis project and the wider open source +ecosystem. These policies should be followed by all developers, including core +and third party contributors. + + +## Suitability of contributions + +Like most open source projects, Apertis requires contributions are submitted +via a process (which in the case of Apertis is defined below) to ensure that +Apertis continues to meet it's design goals and remain suitable for it's +community of users. In addition to design and technical implementation details, +the suitability of contributions will be checked to meet requirements in areas +such as [coding conventions](coding_conventions.md) and +[licensing](license-expectations.md). + + +### Upstream First Policy + +Apertis is a fully open source GNU/Linux distribution that carries a lot of +components for which it is not the upstream. The goal of +[upstream first](upstreaming.md) is to minimize the amount of deviation and +fragmentation between Apertis components and their upstreams. + +Deviation tends to duplicate work and adds a burden on the Apertis developers +when it comes to testing and updating to newer versions of upstream components. +Also, as the success of Apertis relies on the success of open source in general +to accommodate new use cases, it is actively harmful for Apertis to not do its +part in moving the state of the art forward. + +It is the intention of Apertis to utilize existing open source projects to +provide the functionality required, where suitable solutions are available, +over the creation of home grown solutions that would fragment the GNU/Linux +ecosystem further. + +This policy should be taken into consideration when submitting contributions to +Apertis. + + +### Upstream Early, Upstream Often + +One mantra that can be often heard in Open Source communites is "upstream early, +upstream often". The approach that this espouses is to breakdown large changes +into smaller chunks, attempting to upstream a minimal implementation before +implementing the full breath of planned features. + +Each open source community tends to be comprised of many developers, which +share some overlap between their goals, but may have very different focuses. It +is likely that other developers contributing to the project may have ideas +about how the features that you are planning may be better implemented, for +example to enable a broader set of use cases to utilise the feature. Submitting +an early minimal implementation allows the general approach to be assessed, +opinions to be sought and a concensus reached regarding the implementation. As +it is likely that some changes will be required, a minimal implementation +minimizes the effort required to take feedback into account. + +Taking this approach a step further, it can often be instructive to share your +intention to implement larger features before starting. Such a conversation +might be started by sending an email to the projects devel +[mailing list](https://lists.apertis.org/) saying: + +--- +--- + +I'm attempting to use <project> to <task> for my project. + +I'm thinking about doing <brief technical overview> to enable this usecase. + +I'm open to suggestions should there be a better way to solve this. + +Thanks, + +<developer> +--- + +This enables other experienced developers the chance to suggest approaches that +may prove to be the most efficient, saving effort in implementation and later +in review, or may point to missed existing functionality that can be used to +solve a given need without needing substancial development effort. + + +## Extending Apertis + +### Adding components to Apertis + +Apertis welcomes requests for new components to be added to the distribution +and can act as a host for projects where required, however the open source +focus of Apertis should be kept in mind and any proposed contributions need to +both comply with Apertis policies and present a compelling argument for +inclusion. + +Additional components can be categorised into 3 main groups: + +- Existing upstream component available in Debian stable (with suitable version) +- Existing upstream component, not available in debian stable +- New component on gitlab.apertis.org + +There is a maintenance effort associated with any components added to Apertis, +as any components added will need to be maintained within the Apertis +ecosystem. The effort required to maintain these different categories of +components are very different. Prepackaged Debian components require a lot less +maintenance effort than packaging other existing upstream components. +Developing a new component on gitlab.apertis.org requires both the development +and packaging/maintenance to be carried out within Apertis, significantly +raising the effort required. + +When looking for ways to fullfil a requirement there are a number of factors +that will increase the probability of a solution being acceptable to Apertis. + +- Component already included in Debian stable: As Apertis is based on Debian + and already has processes in place to pull updates from this source. The cost + of inclusion is dramatically lower than maintaining packages drawn from other + sources, as a lot of the required effort to maintain the package is being + carried out within the Debian ecosystem. +- Proven actively maintained codebase: Poorly maintained codebases present a + risk to Apertis, increasing the chance that serious bugs or security holes + will go unnoticed. Picking a solution that has an active user base, a + developer community making frequent updates and/or is a mature codebase that + has undergone significant "in the field" testing makes the solution more + attractive for inclusion in Apertis. It is understood that, whilst extensive, + the Debian repositories are not all encompassing, if proposing an existing + open source component that isn't currently provided by Debian, being able to + show that it is actively maintained will be important. +- Best solution: In general, there exists more open source solutions than there + exists problems. To be in with a good chance of having a component included + in Apertis it will be required to explain why the chosen solution represents + the best option for Apertis. What is "best" is often nuanced and will be + affected by a number of factors, including integration/overlap with existing + components and the size/number of dependencies it has (especially if they + aren't currently in Apertis). It may be that whilst a number of existing + solutions exist, none of them are a good fit for Apertis. This may suggest a + new component is the best solution, though adapting/extending one of the + existing solutions should also be considered. + +The Apertis distribution is supported by it's members. As previously mentioned, +in order to ensure that Apertis remains viable and correctly focused, it is +important that any additions to the main +[Apertis projects](https://wiki.apertis.org/Package_maintenance) are justified +and can be shown to fill a specific and real use case. Maintaining the +packaging, updating the codebases of which Apertis is comprised and performing +testing on supported platforms is a large part of the effort needed to provide +Apertis. As a result, it will be necessary to either be able to provide a +commitment to support any packages proposed for inclusion in the main Apertis +projects or gain such a commitment from an existing member. + +The Apertis development team commit to maintaining the packages included in the +references images. Packages may be added to the main package repositories but +not form part of the reference images. Such packages will be maintained on a +best effort basis, that is as long as the effort remains reasonable the Apertis +team will attempt to keep the package in a buildable state, however runtime +testing will not be performed. Should the package fail to build or runtime +issues are reported and significant effort be required to modify the package +the original or subsequent users of the package may be approached to help +resource fixing the package. Ultimately the package may be removed if a +solution can not be found. Likewise, should a different common solution for +Apertis be chosen at a later date, the package may be deprecated and +subsequently removed. + +Proposals for inclusion of new components are expected to be made in the form +of a written proposal. Such a proposal should contain the following +information: + +- Description of the problem which is being addressed +- Why the functionality provided by the proposed component is useful to Apertis + and it's audience +- A review of the possible solutions and any advantages and disadvantages that + have been identified with them +- Why the proposed solution is thought to present the best way forward, noting + the points made above where relevant +- Whether any resources are to be made available to help maintain the component. + +An alternative to adding packages to the main Apertis project is to apply to +have a dedicated project area, where code specific to a given project can be +stored. Such an area can be useful for providing components that are highly +specific to a given project and/or as a staging area for modifications to core +packages that might later get folded back into the main area, either by changes +being submitted to the relevant Apertis component or after changes have been +[upstreamed](upstreaming.md) to the components main project. A dedicated area +will allow a project group to iterate on key components more rapidly as the +changes made do not need to work across the various supported hardware +platforms. It must be noted that whilst a dedicated project area would allow +some requirements with regard to platform support to be ignored, packages in +such areas would still be required to comply with other Apertis rules such as +[open source licensing](license-expectations.md). It should be expected that +the Apertis developers will take a very hands off approach to the maintenance +and testing of packages in such areas. If packages in such areas require work, +the project maintainers will be contacted. The Apertis maintainers may at their +discresion help with minor maintenance tasks should a package be of interest to +the Apertis project. Packages that become unmaintained may be removed. + +Requests for dedicated project areas are also expected to be made in a form of +a written proposal. Such a proposal should contain the following information: + +- Description of the project requiring a dedicated project area +- Preferred name to be used to refer to the project +- Expected use of the dedicated area +- Expected lifetime of the project area +- Contact details of project maintainers + +Such submissions should be made via the devel [mailing list](https://lists.apertis.org/). + +The submission should be discussed on the mailing list and must be agreed with +the Apertis stakeholders. + + +### Extending existing components + +Apertis carries a number of packages that have been modified compared to their +upstream versions. It is fairly typical for distributions to need to make minor +modifications to upstream sources to tailor them to the distribution, Apertis +is not different in this regard. + +Whilst Apertis does accept changes to existing components, it needs to be +acknowledged that this increases the effort required to maintain the package in +question. It may be requested that an attempt be made to upstream +the changes, in line with the [upstream first](#upstream-first-policy) policy, +either to the packages upstream or Debian. More guidance is provided in the +[upstreaming](upstreaming.md) documentation. + + +## Development Process + +The process for contributing code to Apertis is the same as for many other open +source projects: check out the code, make changes locally, then submit them for +review and merging. Apertis uses its GitLab instance as a code repository and +review tool. + +This guide assumes good working knowledge of git (see +[Version control](https://wiki.apertis.org/Guidelines/Version_control) for +specific guidelines) and an account with commit rights on GitLab (see +[Getting commit rights](#getting-commit-rights) for guidance). + + +### Branch + +Propose changes by pushing a git branch to GitLab. + +* Check out the git repository: +--- +git clone https://gitlab.apertis.org/[group name]/[repository name].git +--- +* Create a local branch and make code changes in there. The branch should have + the following naming convention: `wip/$username/$feature` or + `wip/$username/$taskid`. This branch should be pushed into the destination + repository: if developers lack permissions to do so they can ask to be + granted the needed privileges on the developer + [mailing list](https://lists.apertis.org/) or push to a personal repository + instead. +* Commit the changes (see the + [guidelines for making commits](https://wiki.apertis.org/Guidelines/Version_control#Guidelines_for_making_commits) + ), test them and check you are allowed to submit them under the project’s + license, following the [sign-offs](#signoffs) documentation +--- +git commit -i -s +--- +* Every commit must have an appropriate [`Signed-off-by:` tag](#signoffs) in + the commit message. +* Add a `Fixes: APERTIS-<task_id>` tag for each task in the proposed commit + messages (as explained in the section "Automatically closing tasks" below or + in the envelope message of the merge request itself) in order to link the + merge request to one or more tasks in Phabricator. +* Note: The tag will ensure that Phabricator tasks are kept up-to-date with + regard to the status of related merge requests, through the creation of a new + comment with the link to the merge request every time a merge request is + created/updated/merged. This syntax has been chosen for the tag because it is + already + [supported by gitlab](https://docs.gitlab.com/ce/user/project/integrations/custom_issue_tracker.html). + + +### Merge Request + +* Once the changes are ready to be reviewed, create a + [merge request on GitLab](https://docs.gitlab.com/ce/user/project/merge_requests/). +* The merge request should have the "Remove source branch when merge request + accepted" option enabled. +* If changes are not ready and some (strongly encouraged) early feedback is + required, the merge request should be + [marked as Work In Progress (WIP)](https://docs.gitlab.com/ce/user/project/merge_requests/work_in_progress_merge_requests.html). + + +### Review + +* Notify maintainers on the devel [mailing list](https://lists.apertis.org/) + when submitting or updating a merge request. +* The repository maintainer or another developer will review the merge request. +* Reviews should happen on GitLab. +* Reviewers can propose concrete code snippets that the submitter can decide to + integrate in their patch series or not: + * Reviewers can use code blocks in comments, and the submitter can copy and + paste them in their patches when updating the merge request + * Reviewers can edit the submitted branch with the GitLab Web IDE or checking + it out manually, stacking new commits on top of the submitted patches: it's + up to the submitter to cherry-pick or reject the suggested changes and + squash them appropriately with the original commits before the request can + be merged + * If submitters do not agree with the suggested snippets they need to explain + why they are rejecting the proposed changes +* If changes are needed, a force-push with the changes is required to the + submitted branch to update the commits that are part of the merge request + (always remember to + [use `--force-with-lease`](https://developer.atlassian.com/blog/2015/04/force-with-lease/) + instead of `--force`). +* Questions, request to clarify and any other kind of discussion should be + handled via comments on the merge request. +* Repeat these review steps until all the changes are accepted. + + +### Merge + +* Merge request must have no thumbs down to be merged and all the + [discussions should be resolved](https://docs.gitlab.com/ce/user/project/merge_requests/#resolve-discussion-comments-in-merge-requests-reviews). +* Reviewers that approve the general approach but still want to discuss some + details can add a thumb up, so other reviewers know that once all the + discussions are resolved the merge request can be approved +* Once all comments have been handled, the reviewer hits the merge button or + otherwise pushes the submitted commits to the appropriate branch. + + +### Extra Actions + +* If other actions need to be manually taken when commits are landed (for + instance, + [triggering the `tag-release` pipeline](https://gitlab.apertis.org/infrastructure/ci-package-builder#landing-downstream-changes-to-the-main-archive)) + the developer accepting and merging the changes is responsible for ensuring + that those actions are taken, either by doing them themselves or by asking + someone else to do that. +* If the merged commits need to be backported to other branches through + cherry-pick or merges, those should go through merge requests as well: + however, only changes related to the backport process are allowed, unrelated + changes should be applied with a separate merge request, possibly landing in + master first. + + +### Release Commits + +The process for releasing a package is documented in the +[ci-package-builder documentation](https://gitlab.apertis.org/infrastructure/ci-package-builder#landing-downstream-changes-to-the-main-archive) + + +# Other important bits + +## Sign-offs + +Like the git project and the Linux kernel, Apertis requires all contributions +to be signed off by someone who takes responsibility for the open source +licensing of the code being contributed. The aim of this is to create an +auditable chain of trust for the licensing of all code in the project. + +Each commit which is pushed to git master **must** have a `Signed-off-by` line, +created by passing the `--signoff`/`-s` option to `git commit`. The line must +give the real name of the person taking responsibility for that commit, and +indicates that they have agreed to the +[Developer Certificate of Origin](http://developercertificate.org/). There may +be multiple `Signed-off-by` lines for a commit, for example, by the developer +who wrote the commit and by the maintainer who reviewed and pushed it: + +--- +Signed-off-by: Random J Developer <random@developer.example.org> +Signed-off-by: Lucky K Maintainer <lucky@maintainer.example.org> +--- + +Apertis closely follows the Linux kernel process for sign-offs, which is +described in section 11 of the +[kernel guide to submitting patches](https://www.kernel.org/doc/Documentation/SubmittingPatches). + + +## Privileged processes + +Pushing commits to gitlab.apertis.org requires commit rights which are only +granted to trusted contributors (see +"[Getting commit rights](#getting-commit-rights)" for how to get +-commit rights). All commits must have a +[`Signed-off-by` line](#signoffs) assigning responsibility for their open +source licensing. + +Packaging up and releasing new versions of Apertis modules as Debian packages +requires packaging rights on build.collabora.co.uk (OBS), which are issued +separately from commit rights. + +Submitting automated test runs on lava.collabora.co.uk requires CI rights, +which are granted similarly to packaging rights. However, CI results may be +viewed read-only by anyone. + + +### Getting commit rights + +Commit rights (to allow direct pushes to git, and potentially access to the +package building system, build.collabora.co.uk) may be granted to trusted third +party contributors if they regularly contribute to Apertis, with high quality +contributions at the discretion of current Apertis maintainers. + +If you wish to acquire an account on the Apertis GitLab instance or other +Apertis infrastructure, please send an email to +`account-requests@apertis.org` including: + +- Your full name +- The email address you prefer to be contacted through +- The nickname/account name you wish to be known by on the Apertis GitLab diff --git a/content/designs/debug-and-logging.md b/content/designs/debug-and-logging.md new file mode 100644 index 0000000000000000000000000000000000000000..5a12af73e5208df45ecffe66d6e589f7e654dc87 --- /dev/null +++ b/content/designs/debug-and-logging.md @@ -0,0 +1,778 @@ +--- +title: Debug and logging +short-description: Approaches to debugging components of an Apertis system + (general-design) +authors: + - name: Philip Withnall +--- + +# Debug and logging + +## Introduction + +This documents several approaches to debugging components of an Apertis +system, either during development, or in the field. This includes +debugging tools for reproducing and analysing problems; and logging +systems for gathering data about problems and about system behaviour. + +The major considerations with a debugging and logging system are: + + - Reproducibility: Many of the hardest problems to diagnose are ones + which are hard to reproduce. A set of debugging tools should make it + easy to reproduce problems, and certainly should not make the + problems disappear when being debugged. + + - Timing: An important part of ensuring that problems are reproducible + is ensuring that timing effects are reproducible, which means that a + debugging system must have a low (almost zero) overhead, in order to + avoid disturbing timing effects. Secondarily to this, it must allow + the developer to see the order in which events occurred during the + course of a problem. + + - Context: As well as helping reproducibility of a problem, a + debugging system should reduce the need to reproduce the problem in + the first place — by capturing as much contextual information about + it on the initial attempt at debugging. + + - Confidentiality: Any system which logs information about a running + system must ensure that the logged data remains confidential apart + from to developers who need it for debugging. This may mean that + logging is not enablable on production systems. + +## Terminology and concepts + +### Application bundle + +An *application bundle* is a group of functionally related components +(services, data or programs) installed as a unit. This matches the sense +with which ‘app’ is typically used on mobile platforms such as Android +and iOS. (See the Applications design document for the full definition.) + +### Component + +An application bundle or system service. + +### Trusted dealer + +An authorised vehicle dealer, garage or other sale or repair location +which has a business relationship with the vehicle manufacturer. + +## Use cases + +A variety of use cases for scenarios where a component needs debugging, +or where logging data are needed, are given below. Particularly +important discussion points are highlighted at the bottom of each use +case. + +Some of these cases may be already solved in the Apertis distribution in +its current state. However, they will all have an effect, to a greater +or lesser extent, on this design. + +### Debug deterministic application on SDK + +An application developer needs to be able to debug their application +when running it on the SDK, diagnosing crashes and looking at log output +for that particular application. + +### Debug non-deterministic application on SDK + +An application developer is working on an application whose behaviour +appears non-deterministic (for example, due to using a lot of threads, +or depending on sensitive timing). They manage to reproduce a particular +bug only occasionally, but need to debug it further. + +### Debug application on target + +An application developer needs to be able to debug their application +when running it on the target device (connected to an SDK machine during +development), diagnosing crashes and looking at log output for that +particular application. + +### Debug application in the context of the whole system + +An application developer has a problem with their application which is +dependent on the state of the whole (integrated) target system, rather +than just on internal state in their application. They need to be able +to correlate system state with their application’s internal state. + +### Extract logs from a device under test + +An Apertis tester has observed a failure in a development vehicle while +doing field testing on it. They need to be able to extract logs from the +vehicle after the event, and examine them offline to diagnose the +failure. + +### Trusted dealer can extract logs from a device post-production + +A vehicle owner has brought their vehicle into the garage with a failure +in the IVI system. The trusted dealer at the garage extracts logs from +the vehicle and passes them to the vehicle vendor for analysis, +potentially leading to a fix for the problem in a subsequent release of +the CE domain operating system for that vehicle. + +### Third party cannot extract logs from a device post-production + +A vehicle owner likes to tinker with their vehicle, and would like to +look at the logs which their trusted dealer can look at, in order to get +more information about reverse engineering the IVI system in their +vehicle. + +They must not be able to access these logs. + +### Logging storage space is limited in post-production + +On a production vehicle, the amount of storage space available for +logging is limited, so the system should log only the most important or +recent and relevant messages, and not write other messages to persistent +storage. + +### Record and replay logs for input to an application + +An application developer has found a problem in their application which +depends on external input to it, and subtle timing sequences of that +input. The input includes sensor input (from the SDK API, over D-Bus), +and user interactions with the interface using the touchscreen and +on-screen keyboard. This makes it a hard problem to reproduce. They want +to add a regression test for it to their application, and want to +automate it because reproducing the problem manually is too hard. This +regression test needs to perfectly reproduce the problem each time it is +run. + +The application has more than one process (it has one or more agent +processes, in addition to the main UI); all the processes communicate +with each other at runtime. + +### Record and replay logs for sensors to the whole system + +An Apertis tester wants to test the whole system against a variety of +road trips, but it would be a waste of time to repeatedly drive a +vehicle around a real road system in order to do repeat test runs. They +want a replayable log file of all the sensor inputs from the vehicle, +which can be replayed to the whole Apertis system on a development +machine, to allow repeated testing of how the system responds to those +inputs. + +### Performance profiling + +An application is performing poorly on the target device, and the +developer wants to diagnose the problem so they can fix it. + +### Denial of service attack on logging + +A misbehaving or malicious application is submitting log messages as +fast as it can. This should not adversely affect system performance, or +cause other log messages to be prematurely dropped. + +### Private application log file + +An application is being ported from another platform to Apertis, and it +already has its own logging infrastructure, storing log messages in a +private log file. The developers wish to keep this infrastructure, +rather than (or as well as) integrating with the Apertis logging +infrastructure. + +## Non-use-cases + +### Record and replay logs for entire system behaviour + +While [this use case][Record and replay logs for sensors to the whole system] is legitimate, +it becomes harder to record the *entire* system behaviour (as opposed to just the inputs from +the sensor system), as that starts to be affected by differences in the +components which are being tested if those components are changed to +test new features. For example, if the entire system behaviour were +recorded and replayed, it might not be possible to run a debugger on the +system while replaying a log, as the debugger would impact the replay +state too much. + +## Requirements + +### Code debugger installable on development and target machines + +A code debugger must be available in Apertis, and installable on +development and target machines so that it can be used by Apertis and +application developers. + +The tool must allow interactive walking through the stack, printing +expressions, and other common C debugging functions. + +See [][Debug deterministic application on SDK]. + +### Code debugger can be used remotely + +The code debugger must be usable remotely in real time, most likely with +a server component running on the target device, and a client component +on the developer’s machine. + +See [][Debug application on target]. + +### Code record and replay tool installable on development and target machines + +A code record and replay tool must be available in Apertis, and +installable on development and target machines so that it can be used by +Apertis and application developers. + +The tool must allow recording all inputs to an Application from the +kernel, plus any other system behaviour which would influence the +application’s behaviour. Those logs must be stored as files, and +replayable many times. + +When replaying logs, the developer must be able to use a debugger to +investigate problems. + +See: + - [][Debug non-deterministic application on SDK] + - [][Debug application in the context of the whole system] + - [][Record and replay logs for input to an application] + +### Application logs available in Eclipse when run on the SDK + +When developing an application in Eclipse, the logging calls the +application uses must send their output to the Eclipse console (i.e. +stdout or stderr) rather than (or as well as) the SDK system’s journal. +This allows the developer to easily read those messages. + +See [][Debug deterministic application on SDK]. + +### Whole system logs are aggregated and timestamped + +All log messages from all system components and services must be +directed to a central logging repository, which must timestamp them all +in order (so that all the timestamps are directly comparable). + +See Extract logs from a device under test, Debug application in the +context of the whole system. + +### Whole system logs are tagged by process and priority + +All log messages from all system components and services must be tagged +with the name of the process which generated them, and their priority +(for example, ‘debug’ versus ‘warning’ versus ‘error’). This metadata +must be available to the developer to allow them to filter logs for +relevant messages. + +See: + - [][Debug deterministic application on SDK] + - [][Debug application on target] + +### Whole system logs are limited by priority and rotated + +On a production vehicle, the log messages which are written to +persistent storage must be limited to only the most recent logs +(according to some age cutoff) and the most important logs (according to +some priority cutoff). These cutoffs must be configurable at production +time. + +It may be possible to keep all other log messages in memory while the +vehicle is running, for example to allow them to be uploaded to an +online diagnosis service in case of a fault. They must not, however, be +written to disk. + +See [][Logging storage space is limited in post-production]. + +### Extract whole system logs from target device + +The aggregated system log on a development target device must be +accessible by the developer, who must be able to copy it to their +development machine for analysis. The log does not necessarily have to +be extractable in real time, though that would be helpful. + +See [][Extract logs from a device under test]. + +### Extract whole system logs from target device in post-production + +The aggregated system log on a production target device must be +extractable by a trusted dealer so that It can be sent to an Apertis +developer for analysis. Extracting the log may require physical access +to the vehicle. + +See [][Trusted dealer can extract logs from a device post-production]. + +### Protect access to whole system logs on production devices + +The aggregated system log on a production device must only be +extractable by a trusted dealer or other authorised representative of +the vehicle manufacturer. + +See [][Third-party cannot extract logs from a device post-production]. + +### Code record and replay tool can handle multiple processes + +The code record and replay tool must be able to record and replay a +single log for multiple processes, such as an application and its +agents. They must all see the same timing information. + +See [][Record and replay logs for input to an application]. + +### Record and replay SDK sensor data + +It must be possible to record all D-Bus traffic to and from the SDK +sensors API for a given time period (a ‘trip’), and later replay that +log to the whole system instead of using current sensor data. + +See [][Record and replay logs for sensors to the whole system]. + +### Profiling tools installable on development and target machines + +A variety of profiling tools must be available in Apertis, and +installable on development and target machines so that they can be used +by Apertis and application developers. + +See [][Performance profiling]. + +### Rate limiting of whole system logs + +To prevent denial of service attacks on the system log, rate limiting +must be applied to log message submissions from each application. If an +application submits log messages at too high a rate, the extras must be +dropped. + +See [][Denial of service attack on logging]. + +### Applications can write their own log files + +Application developers may choose to ignore or supplement the Apertis +SDK logging infrastructure with their own system which writes to a log +file in their application’s storage space. This must be permitted, +although the SDK does not have to provide convenience API for it. + +See [][Private application log file]. + +### Disk usage for each application is limited + +An application is logging to its own private log file, rather than the +system journal. The system must constrain the amount of disk space the +application can use, so that it cannot prevent other applications from +working by consuming all free disk space. If the application consumes +too much disk space, the system may delete its files or prevent it from +working. + +See [][Private application log file]. + +## Existing debug and logging systems + +**Open question**: What existing debug and logging systems are relevant +to do background research on? + +## Approach + +Based on the above research (section 6) and requirements (section 5), we +recommend the following approach as an initial sketch of a debug and +logging system. + +### GDB and gdbserver + +For real-time debugging of applications, both on a local SDK system and +on a remote target system, GDB should be used. For debugging remote +systems, gdbserver should be set up on the remote system and GDB used as +a client to control it. + +They must both be available in the development repository, and hence +installable on development and target devices. + +### Record and Replay (rr) + +For debugging of non-deterministic problems and problems which depend on +context or state outside of the application, Mozilla’s Record and Replay +(rr) tool should be used. It works by recording all input and output to +a process (especially the input and output via kernel APIs), and +allowing that log to be replayed while re-running the application. This +eliminates all sources of non-determinism in the replay, ensuring that +the conditions which triggered the original problem can be reproduced +every time. + +Crucially, rr works with D-Bus: as all socket input and output for an +application is recorded, this includes all D-Bus traffic — this is +reproduced faithfully in any re-runs of the application. As many of the +Apertis SDK APIs are provided via D-Bus, this is a crucial feature. + +In addition, rr can record a group of processes to a single log, and +replay to the same group later on. This can be used for debugging an +application together with its agents, for example. + +Note, however, that rr is a *replay* tool and not an *interactive* +debugger — a developer cannot replay a log recorded against one version +of an application with a newer version of the application (for example, +with changes which the developer hopes will fix the bug they’re +investigating). This is because it would change the program’s output +behaviour and hence its effects on external processes. + +For example, consider a bug where a program is writing a network packet +to the wrong socket out of two it has open. rr has recorded the response +from the socket the program was originally sending to (the wrong socket) +— when a fixed version of the program is run, the log file rr is using +will not have a response stored for the second (correct) socket. + +This must be available in the development repository, and hence +installable on development and target devices. + +### systemd journal + +All log output from processes on the target system should be sent to the +systemd journal, allowing it to provide a single source of log data for +the entire system, with all log messages in a single ordering. This +includes debug messages, errors, warnings, and other log output. All +messages should be sent with a priority level, plus additional metadata +if relevant. The journal automatically adds the sending process’ name to +log entries. + +When developing on a local SDK system, the log should be queried using +the journalctl command line tool. + +If a program is run manually from a console, or from within Eclipse, all +log output must also be sent to stdout or stderr so that it appears on +the console or the [Eclipse console][gsystem-log]. + +### Application log files + +If an application developer chooses to log their application’s messages +to a private log file instead of, or as well as, to the systemd journal, +this is permitted. The SDK may not provide convenience APIs for doing +this, other than its APIs for file input and output. For example, it is +up to the application developer to implement rate limiting, log file +rotation and vacuuming. + +Applications must not be able to consume all available disk space and +prevent the system or other applications from working. The safety +measures to prevent this are detailed in the Robustness design document, +and primarily involve putting each application’s storage area in +a separate file system. + +Applications may only write to their own log files if they have +permission to write to persistent storage, which is one of the standard +permissions in the application manifest. + +### Diagnostic log and trace + +When testing a component on a target system, the developer should use +diagnostic log and trace (DLT) from GENIVI — this is a client–server +system where the DLT daemon runs on the target system and forwards +systemd journal messages over the network to the developer’s system, +where they are presented in the DLT Viewer UI, which allows filtering, +ordering, and other analysis to be performed on the logs. + +However, DLT is only as useful as the log messages sent to it by the +components on the system. Certain components may need to be modified to +emit more log messages. + +The DLT daemon exposes itself on the network and on the serial port with +no authentication, so must not be installed by default on production +systems. + +### Extracting logs from a post-production system + +For extracting logs from a post-production system, a new *journal export +service* must be written which provides and authenticates access to the +systemd journal. + +This service would essentially run the journalctl -o [export][systemd-export] command to +retrieve a full copy of the system’s logs in a stable format suitable +for sending to another system for review. + +The service would need to listen on some external interface which a +trusted dealer could connect to. This could, for example, be a network +port; or it could be a physical connector on the IVI system’s main +board. In any case, the service must require authentication before +exporting any logs. + +**Open question**: What external interface can the journal export +service listen on? + +The authentication mechanism chosen depends partially on the +characteristics of the interface the service listens on. It would most +likely be a [challenge–response protocol] issued by the journal export +service, where the trusted dealer proves knowledge of a secret +which has been issued by the vehicle manufacturer. + +**Open question**: Should the logs be exported in an encrypted form, to +keep them confidential while being stored by a trusted dealer? + +### D-Bus monitoring + +As many of the Apertis SDK APIs are provided via D-Bus, an easy way to +see what they’re doing is to log all D-Bus traffic on the system and +session buses. This can then be exposed by the DLT Viewer (or the local +journalctl tool) and analysed. + +A new *D-Bus logging service* (similar to the dbus-monitor tool, but +presented as a systemd service which is enablable by developers, and +only on development images) should be written which logs all traffic for +a specified D-Bus bus to the systemd journal. + +Note that this does not allow for log replay. For specific cases, this +will be handled using [][Trip logging of SDK sensor data]. + +### Trip logging of SDK sensor data + +In order to record ‘trip logs’ of the sensor data sent to and from the +SDK sensor API and the entirety of the rest of the system, a *D-Bus +record and replay tool* should be written. When recording, this could +monitor the D-Bus session bus and record all traffic to and from the +sensor API. When replaying, it would replace the SDK sensor service on +the bus, and impersonate all its APIs, replaying responses from the log. +This program must be aware of the semantics of D-Bus messages so, for +example, it would not store the serial number of a message reply, but +would instead use the serial number corresponding to the method call at +the time of replay. Similarly, it must be aware of common D-Bus +interfaces such as org.freedesktop.DBus.Properties and know that the +value of a property remains unchanged unless a notification signal has +been emitted for it. + +One implementation option would be to implement this based on the +dbus-monitor code: log all messages to or from the sensors API, and +extract ones with known semantics, such as +org.freedesktop.DBus.Properties method calls and signals. The replay +code would maintain a queue of pairs of (expected method call, reply), +and for each incoming method call, would return and remove the first +matching reply from the queue; or would return an error otherwise. For +calls to known interfaces like org.freedesktop.DBus.Properties, the +property state would be emulated with the correct semantics. +Asynchronous events, such as signal emissions from the sensors API, +would be emitted at the appropriate time relative to their surrounding +events, rather than based on the absolute timestamp they were originally +logged at. For example, if the log contained a signal emission after +method call A and before method reply B, that signal would only be +emitted in the replayed log after the program under test had made method +call A. + +An alternative implementation, which would be faster to implement but +less generic and hence could not be repurposed for logging other SDK +services in future, would be to use [python-dbusmock] +to build a specific mock service for the sensors API. This service would have full +knowledge of the semantics of all the D-Bus messages it sent and +received — the full sensors SDK API, rather than just the standard D-Bus +interfaces. The log file would be generated similarly to in the first +implementation — by monitoring and interpreting the D-Bus traffic for +the sensors API. The file would contain an initial set of values for the +properties of all the sensors, followed by timestamped updates to each +value as it changed during logging. + +A third, most-specific, implementation option, is to use the emulator +backend service for the vehicle device daemon (See the Sensors and Actuators design), +and feed the recorded trip logs to it. This has the advantage of re-using the vehicle +device daemon’s SDK API, without having to mock it up. The emulator +backend service has to be written anyway, in order to implement the +sensors and actuators emulator (see section 8.4 of the Sensors and +Actuators design, version 0.3.0). This would be the fastest +implementation option, and the least re-usable. + +#### Example trip files + +To give application developers some baseline situations to test against, +it would be helpful if Apertis or OEM variants of it shipped with +several example trip logs, demonstrating some common or uncommon driving +situations which applications must handle. + +**Open question**: Should example trip files be produced by Apertis, or +by OEMs so they are specific to vehicles? + +### Security + +The security issues from logging are all concerned with confidentiality +of system information, which may include sensitive data from a variety +of processes. + +This data must be kept confidential, both within the system (for +example, applications must not have access to the logs of any process +which is not in their trust domain), and from external attackers. + +On production devices, especially, access to full system logs is a +valuable goal for an attacker, as it gives insight into how the system +is configured and further potential attack targets. For this reason, it +may be worthwhile considering whether to reduce or disable logging on +production systems. + +Conversely, log entries from production devices are very useful for +debugging unreproduceable post-production problems. Therefore, the +choice of logging verbosity on production systems becomes a trade-off +between the risk of confidentiality breaches, and the practicality of +being able to debug problems. + +**Open question**: What level of logging should be enabled for +production systems versus development systems? + +### Disk usage and performance + +Storing log entries persistently consumes an unbounded amount of disk +space. A limit must be applied to the number or age of log entries which +are stored before being dropped. The systemd journal must have a disk +space or age limit applied; this can be done by editing +/etc/systemd/journald.conf and adding the following, for example: + +--- +SystemMaxUse=100M +--- + +To limit the priority level of messages which are stored to disk, the +following configuration option can be used; it is highly recommended to +set it to ‘debug’ on development systems and ‘error’ for production +systems. + +> The full range of options is documented in `man 5 journald.conf` + +--- +MaxLevelStore=error +--- + +Logging must not have a large runtime overhead — each call from a +process to the logging API must be fast. Furthermore, rate limiting must +be applied to prevent a misbehaving application from overfilling the +system logs. This can be achieved using the following configuration +options for the systemd journal; the following values limit each process +to at most 1000 messages in a given 30 seconds: + +--- +RateLimitInterval=30s +RateLimitBurst=1000 +--- + +As discussed in the Robustness design, the journal should +additionally be configured to leave an amount of free space smaller than +the reserved blocks of the file system containing the log files, so that +log messages can continue to be written in low disk space conditions, +allowing easier diagnosis of the problem: + +--- +SystemKeepFree=5% +--- + +### Profiling tools + +A variety of profiling tools should be packaged for the Apertis +development repository: + + - perf + + - valgrind + + - google-perftools + + - strace + + - ltrace + + - systemtap + + - gprof + + 1. ### Suggested roadmap + +GDB and DLT are already packaged, so no further work is needed there; as +are all the profiling tools. + +rr is not yet packaged, but should be. + +Integration of everything into the systemd journal, plus adding +additional debug messages to various system services to improve +debuggability of those services. + +The journal export service, D-Bus logging service and D-Bus record and +replay tools are all self-contained, so could be produced individually +as later stages in the implementation. + +### Requirements + + - [][Code debugger installable on development and target machines]: + GDB is the debugger. + + - [][Code debugger can be used remotely]: + GDB can be used with gdbserver. + + - [][Code record and replay tool installable on development and target machines]: + rr is the record and replay tool. + + - [][Application logs available in Eclipse when run on the SDK]: + Outputting log entries to stdout or stderr if running on a console. + + - [][Whole system logs are aggregated and timestamped]: + All system logs are forwarded to the systemd journal. D-Bus messages are logged + to the journal via a new D-Bus logging service. + + - [][Whole system logs are tagged by process and priority]: + Done by the systemd journal by default. + + - [][Whole system logs are limited by priority and rotated]: + Done with suitable configuration of the systemd journal. + + - [][Extract whole system logs from target device]: + DLT is used to extract logs and transfer them to a developer machine in real time. + + - [][Extract whole system logs from target device in post-production] + New journal export service exposing an + authenticated interface for exporting systemd journal logs. + + - [][Protect access to whole system logs on production devices]: + Journal export service requires authentication. + + - [][Code record and replay tool can handle multiple processes]: + rr supports logging and replaying to multiple processes. + + - [][Record and replay SDK sensor data]: + D-Bus record and replay tool will be used for this. + + - [][Profiling tools installable on development and target machines]: + Various profiling tools will be packaged. + + - [][Rate limiting of whole system logs]: + Done with suitable configuration of the systemd journal. + + - [][Applications can write their own log files]: + Allowed for any application which is allowed to write files. + + - [][Disk usage for each application is limited]: + Each application writes to its own file system as in the Robustness design. + +## Open questions + + - What existing debug and logging systems are relevant to do + background research on? + + - What external interface can the journal export service listen on? + + - Should the logs be exported in an encrypted form, to keep them + confidential while being stored by a trusted dealer? + + - Should example trip files be produced by Apertis, or by OEMs + so they are specific to vehicles? + + - What level of logging should be enabled for production systems + versus development systems? + +## Summary of recommendations + +As discussed in the above sections, we recommend: + + - Packaging Mozilla’s Record and Replay (rr) tool for the development + repository. + + - Ensure that all system components and services are logging + exclusively to the systemd journal. + + - Configure the systemd journal to handle log expiry, rotation and + priority storage levels to avoid consuming unbounded disk space. + + - Potentially add more debug log messages to various system services + to give more context when debugging applications. + + - Write a journal export service for exporting the systemd journal + with authentication from a production system. + + - Write a D-Bus logging service for logging all D-Bus traffic to the + systemd journal to give more context when debugging applications. + + - Write a D-Bus record and replay tool for producing trip logs from + the SDK sensor API. + + - Audit the confidentiality of the systemd journal and ensure it is + only accessible to developers and the journal export service. + + - Write documentation on how to use the Apertis SDK logging API, and + advice for application developers who want to use their own logging + systems. + +[gsystem-log]: https://git.gnome.org/browse/libgsystem/tree/src/gsystem-log.c#n128 + +[systemd-export]: http://www.freedesktop.org/wiki/Software/systemd/export/ + +[challenge–response protocol]: https://en.wikipedia.org/wiki/Challenge%E2%80%93response_authentication + +[python-dbusmock]: https://github.com/martinpitt/python-dbusmock diff --git a/content/designs/development.md b/content/designs/development.md new file mode 100644 index 0000000000000000000000000000000000000000..a8f333ae4559934b90c6b34b5b9fde8ac461ad2d --- /dev/null +++ b/content/designs/development.md @@ -0,0 +1 @@ +# Apertis Development diff --git a/content/designs/encrypted-updates.md b/content/designs/encrypted-updates.md new file mode 100644 index 0000000000000000000000000000000000000000..cfb502a9ed77059d4947a41caf69f6a7cc85f8a4 --- /dev/null +++ b/content/designs/encrypted-updates.md @@ -0,0 +1,118 @@ +--- +title: Encrypted updates +short-description: Offline update support with encrypted bundle +authors: + - name: Frederic Danis +--- + +# Encrypted updates + +## Introduction + +The encryption of the update file makes accessing its contents more difficult for bystanders, but doesn't necessarily protect from more resourceful attackers that can extract the decryption key from the user-owned device. + +The bundle encryption is done using the loop device with standard/proven kernel facilities for de/encryption (e.g. dm-crypt/LUKS). This allows the mechanism to be system agnostic (not tied to OSTree bundles), and can be used to ship updates to multiple components at once by including multiple files in the bundle. +dm-crypt is the Linux kernel module which provides transparent encryption +of block devices using the kernel crypto API, see [dm-crypt](https://gitlab.com/cryptsetup/cryptsetup/-/wikis/DMCrypt). +LUKS is the standard for Linux hard disk encryption. It provides secure management of multiple user passwords, see [LUKS wiki](https://gitlab.com/cryptsetup/cryptsetup/-/wikis/home). + +The authenticity of the update is checked by verifying the OSTree signature as dm-crypt utilises symmetric cryptography which can't be used to ensure trust as the on-device key can be used to encrypt malicious files, not just decrypt them. + +## Threat model + +### Objectives + 1. end-users can download updates from the product website and apply them offline to the device via a USB key or SD card + 2. only official updates should be accepted by the device + 3. the contents of the updates should not be easily extracted, increasing the effort required for an attacker and providing some protection for the business' intellectual property + +### Properties + 1. **integrity**: the device should only accept updates which have not been altered + 2. **authenticity**: the device should only accept updates coming from the producer + 3. **confidentiality**: the contents of the updates should not be disclosed + +### Threats + 1. Alice owns a device and wants to make it run her own software + 2. Emily owns a device and Alice wants to have her own software on Emily's device + 3. Vincent develops a competing product and wants to gain insights into the inner workings of the device + +### Mitigations + 1. **integrity**: the update is checksummed, causing alteration to be detectable + 2. **authenticity**: the update is signed with a private key by the vendor and the device only accepts updates with a signature matching one of the public keys in its trusted set + 3. **confidentiality**: the update is encrypted with a symmetric key (due to technology limitations public key decryption is not available) + +### Risks and impacts + +#### the private key for signing is leaked +##### Impact + * the private key allows Alice to generate updates that can be accepted by all devices + +##### Mitigations + * the private key is only needed on the vendor infrastructure producing the updates + * the chance of leaks is minimized by securing the infrastructure and ensuring that access to the key is restricted as much as possible + * public keys for the leaked private keys should be revoked + * multiple public keys should be trusted on the device, so if one is revoked updates can be rolled out using a different key + * keys should not be re-used across products to compartimentalize them against leaks + +#### the private key for signing is lost +##### Impact + * updates can't be generated if no private key matching the on-device public ones is available + +##### Mitigations + * if multiple public keys are trusted on the device, the private key used can be rotated if another private key is still available + * backup private keys should be stored securely in different locations + +#### the symmetric key for encryption/decryption is leaked +##### Impact + * Alice has access to all symmetric keys stored in bundles encrypted with the leaked key + * the symmetric key allows Alice to generate updates that can be decrypted by devices + +##### Mitigations + * due to its symmetric nature, the secret key has to be available on both the vendor infrastructure and on each device + * secure enclave technologies can help use the symmetric key for decryption without exposing the key in any way + * if secure enclave is not available the key has to be stored on the device and can be extracted via physical access + * if the key can't be provisioned in the factory the key has to be provisioned via unencrypted updates, from which an attacker can extract the keys without physical access to the device + * multiple decryption keys must be provisioned, to be able to rotate them in case of leaks + +#### the symmetric key for encryption/decryption is lost +##### Impact + * encrypted updates can't be generated for devices only using this symmetric key + +##### Mitigations + * given that the key has to be available on each device, the chance of losing the encryption/decryption key is small + * if multiple decryption keys are provisioned on the device, the encryption key can be rotated + * if all keys are lost or corrupted on the device, it will not be possible to decrypt bundles on USB/SDCard and so to update the device using this method. + +## Key infrastructure + +LUKS is able to manage up to 8 key slots, any of the 8 different keys can be used to decrypt the update bundle. +This can allow a bundle to be read using a main key or fallback key(s), and/or by different devices with a different subsets of the used keys. + +On the device itself, Apertis Update Manager is in charge of decrypting the bundle and it will try as many keys as needed to unlock the bundle, there's no limitation on the number of keys which can be stored. + +Random keys for bundle encryption can be generated using: +--- +head -c128 /dev/random | base64 --wrap=0 +--- + +### How keys can be stored on devices + - Keys can be stored in separated files, located in read-only part of the filesystem: `/usr/share/apertis-update-manager/` + - In future versions, keys may be stored using the secure-boot-verified key storage system + +### How keys can be deployed to devices + - Keys stored in the filesystem can be deployed by the normal update mechanism + +### When new keys should be generated +New keys should be generated: + - for new products + - when a key has been compromised + +### How the build pipeline can fetch the keys + - As for the signing key, the key(s) used to encrypt the static delta bundle should be passed to the encryption script GitLab CI/CD variable(s) + +### How multiple keys can be used for key rotations + - When the keys are stored on the filesystem, key rotation will not provide any benefit as the leak of one key implies the leak of the others + - When the keys will be stored using the secure-boot-verified key storage system, the encrypted updates will be generated wih non-leaked keys and will remove the leaked keys while adding the new keys to the secure-boot-verified key storage system, so the number of available keys remain the same + +### How to handle the leak of a key to the public and how that impacts future updates + - If the keys are stored on the filesystem, the leak of one key implies the leak of the others + - If the keys are stored using the secure-boot-verified key storage system, the next update should be signed with a key that hasn't been leaked and the update should revoke the leaked key diff --git a/content/designs/geolocation-and-navigation.md b/content/designs/geolocation-and-navigation.md new file mode 100644 index 0000000000000000000000000000000000000000..d9775dbd3d9634972ae784adb3766099c7476959 --- /dev/null +++ b/content/designs/geolocation-and-navigation.md @@ -0,0 +1,2630 @@ +--- +title: Geolocation and navigation +short-description: Existing solutions for geo-related services (implemented) +authors: + - name: Philip Withnall +--- + +# Geolocation and navigation + +## Introduction + +This documents existing solutions for geo-related features (sometimes +known as location-based services, LBS) which could be integrated into +Apertis for providing geolocation, geofencing, geocoding and navigation +routing support for application bundles. + +As of version 0.3.0, the recommended solutions for most of the +geo-requirements for Apertis are already implemented as open source +libraries which can be integrated into Apertis. Some of them require +upstream work to add smaller missing features. + +Larger pieces of work need to be done to add address completion and +geofencing features to existing open source projects. + +The major considerations with all of these features are: + + - Whether the feature needs to work offline or can require the vehicle + to have an internet connection. + + - Privacy sensitivity of the data used or made available by the + feature — for example, access to geolocation or navigation routing + data is privacy sensitive as it gives the user’s location. + + - All features must support pluggable backends to allow proprietary + solutions to be used if provided by the automotive domain. + +The scope of this design is restricted to providing services to +applications which need to handle locations or location-based data. This +design does not aim to provide APIs suitable for implementing a full +vehicle navigation routing system — this is assumed to be provided by +the [automotive domain], and may even provide some of the implementations +of geo-related features used by other applications. This +means that the navigation routing API suggested by this design is +limited to allowing applications to interact with an external navigation +routing system, rather than implement or embed one themselves. +[][Appendix: Recommendations for third-party navigation applications] +when implementing a navigation application are given. + +## Terminology and concepts + +### Coordinates + +Throughout this document, *coordinates* (or *a coordinate pair*) are +taken to mean a latitude and longitude describing a single point in some +well-defined coordinate system (typically WGS84). + +### Geolocation + +*Geolocation* is the resolution of the vehicle’s current location to a +coordinate pair. It might not be possible to geolocate at any given +time, due to unavailability of sensor input such as a GPS lock. + +### Forward geocoding + +*Forward geocoding* is the parsing of an address or textual description +of a location, and returning zero or more coordinates which match it. + +### Reverse geocoding + +*Reverse geocoding* is the lookup of the zero or one addresses which +correspond to a coordinate pair. + +### Geofencing + +*Geofencing* is a system for notifying application bundles when the +vehicle enters a pre-defined ‘fenced’ area. For example, this can be +used for notifying about jobs to do in a particular area the vehicle is +passing through, or for detecting the end of a navigation route. + +### Route planning + +*Route planning* is where a start, destination and zero or more +via-points are specified by the user, and the system plans a road +navigation route between them, potentially optimising for traffic +conditions or route length. + +### Route replanning + +*Route replanning* is where a route is recalculated to follow different +roads, without changing the start, destination or any via-points along +the way. This could happen if the driver took a wrong turn, or if +traffic conditions change, for example. + +### Route cancellation + +*Route cancellation* is when a route in progress has its destination or +via-points changed or removed. This does not necessarily happen when the +vehicle is stopped or the ignition turned off, as route navigation could +continue after an over-night stop, for example. + +### Point of interest + +A *point of interest* (*POI*) is a specific location which someone (a +driver or passenger) might find interesting, such as a hotel, +restaurant, fuel station or tourist attraction. + +### Route list + +A *route list* is the geometry of a navigation route, including the +start point, all destinations and all vertices and edges which +unambiguously describe the set of roads the route should use. Note that +it is different from *route guidance*, which is the set of instructions +to follow for the route. + +### Horizon + +The *horizon* is the collection of all interesting objects which are +ahead of the driver on their route (‘on the horizon’). Practically, this +is a combination of upcoming *points of interest*, and the remaining +*route list*. + +### Route guidance + +*Route guidance* is the set of turn-by-turn instructions for following a +navigation route, such as ‘take the next left’ or ‘continue along the +A14 for 57km’. It is not the *route list*, which is the geometry of the +route, but it may be possible to derive it from the route list. + +### Text-to-speech (TTS) + +*Text-to-speech (TTS)* is a user interface technology for outputting a +user interface as computer generated speech. + +### Location-based services (LBS) + +*Location-based services (LBS)* is another name for the collection of +geo-related features provided to applications: geolocation, geofencing, +geocoding and navigation routing. + +### Navigation application + +A *navigation application* is assumed (for the purposes of this +document) to be an application bundle which contains + + - a *navigation UI* for choosing a destination and planning a route; + + - a *guidance UI* for providing guidance for that route while driving, + potentially also showing points of interest along the way; + + - a *navigation service* which provides the (non-SDK) APIs used by the + two UIs, and can act as a backend for the various SDK geo-APIs; and + + - a *routing engine* which calculates the recommendation for a route to a + particular destination with particular parameters, and might be implemented + in either the IVI or automotive domain. It is conceptually part of the + *navigation service*. + +These two UIs may be part of the same application, or may be separate +applications (for example with the guidance UI as part of the system +chrome). The navigation service may be a separate process, or may be +combined with one or both of the UI processes. + +The navigation service might communicate with systems in the automotive +domain to provide its functionality. + +Essentially, the ‘navigation application’ is a black box which provides +UIs and (non-SDK) services related to navigation. For this reason, the +rest of the document does not distinguish between ‘navigation UI’, +‘guidance UI’ and ‘navigation service’. + +### Routing request + +A *routing request* is a destination and set of optional parameters (waypoints, +preferred options, etc.) from which the +[routing engine][Navigation application] can calculate a specific route. + +## Use cases + +A variety of use cases for application bundle usage of geo-features are +given below. Particularly important discussion points are highlighted at +the bottom of each use case. + +In all of these use cases, unless otherwise specified, the functionality +must work regardless of whether the vehicle has an internet connection. +i.e. They must work offline. For most APIs, this is the responsibility +of the automotive backend; [][SDK backend] can +assume an internet connection is always available. + +### Relocating the vehicle + +If the driver is driving in an unfamiliar area and thinks they know +where they are going, then realises they are lost, they must be able to +turn on geolocation and it should pinpoint the vehicle’s location on a +map if it’s possible to attain a GPS lock or get the vehicle’s location +through other means. + +### Automotive backend + +A derivative of Apertis may wish to integrate its own geo-backend, +running in the automotive domain, and providing all geo-functionality +through proprietary interfaces. The system integrators may wish to use +some functionality from this backend and other functionality from a +different backend. They may wish to ignore some functionality from this +backend (for example, if its implementation is too slow or is missing) +and not expose that functionality to the app bundle APIs at all (if no +other implementation is available). + +#### Custom automotive backend functionality + +The proprietary geo-backend in a derivative of Apertis may expose +functionality beyond what is described in this design, which the system +integrator might want to use in their own application bundles. + +If this functionality is found to be common between multiple variants, +the official Apertis SDK APIs may be extended in future to cover it. + +### SDK backend + +Developers using the Apertis SDK to develop applications must have +access to geo-functionality during development. All geo-functionality +must be implemented in the SDK. + +The SDK can be assumed to have internet access, so these implementations +may rely on the internet for their functionality. + +### Viewing an address on the map + +The user receives an e-mail containing a postal address, and they want +to view that address on a map. The e-mail client recognises the format +of the address, and adds a map widget to show the location, which it +needs to convert to latitude and longitude in order to pinpoint. + +In order to see the surrounding area, the map should be a 2D top-down +atlas-style view. + +In order for the user to identify the map area in relation to their +current journey, if they have a route set in the navigation application, +it should be displayed as a layer in the map, including the destination +and any waypoints. The vehicle’s current position should be shown as +another layer. + +### Adding custom widgets on the map + +A restaurant application wants to display a map of all the restaurants +in their chain, and wants to customise the appearance of the marker for +each restaurant, including adding an introductory animation for the +markers. They want to animate between the map and a widget showing +further details for each restaurant, by flipping the map over to reveal +the details widget. + +### Finding the address of a location on the map + +The user is browsing a tourist map application and has found an +interesting-looking place to visit with their friends, but they do not +know its address. They want to call their friends and tell them where to +meet up, and need to find out the address of the place they found on the +map. + +### Type-ahead search and completion for addresses + +A calendar application allows the user to create events, and each event +has an address/location field. In order to ease entering of locations, +the application wishes to provide completion options to the user as they +type, if any potential completion addresses are known to the system’s +map provider. This allows the user to speed up entry of addresses, and +reduce the frequency of typos on longer addresses while typing when the +vehicle is moving. + +If this functionality cannot be implemented, the system should provide a +normal text entry box to the user. + +### Navigating to a location + +A calendar application reminds the user of an event they are supposed to +attend. The event has an address set on it, and the application allows +the user to select that address and set it as the destination in their +navigation application, to start a new navigation route. + +Once the navigation application is started, it applies the user’s normal +preferences for navigation, and does not refer back to the calendar +application unless the user explicitly switches applications. + +### Navigating a tour + +A city tour guide application comes with a set of pre-planned driving +tour routes around various cities. The driver chooses one, and it opens +in the navigation application with the vehicle’s current position, a +series of waypoints to set the route, and a destination at the end of +the tour. + +At some of the waypoints, there is no specific attraction to see — +merely a general area of the city which the tour should go through, but +not necessarily using specific roads or visiting specific points. These +waypoints are more like ‘way-areas’. + +### Navigating to an entire city + +The driver wants to navigate to a general area of the country, and +refine their destination later or on-route. They want to set their +destination as an entire city (for example, Paris; rather than 1 Rue de +Verneuil, Paris) and have this information exposed to applications. + +### Changing destination during navigation + +Part-way through navigating to one calendar appointment, the user gets a +paging message from their workplace requiring them to divert to work +immediately. The user is in an unfamiliar place, so needs to change the +destination on their navigation route to take them to work — the paging +application knows where they are being paged to, and has a button to set +that as the new navigation destination. Clicking the button updates the +navigation application to set the new destination and start routing +there. + +#### Navigating via waypoints + +The user has to stop off at their house on their way to work to answer +the paging message. The paging application knows this, and includes the +user’s home address as a navigation waypoint on the way to their +destination. This is reflected in the route chosen by the navigation +application. + +### Tasks nearby + +A to-do list application may allow the user to associate a location with +a to-do list item, and should display a notification if the vehicle is +driven near that location, reminding the driver that they should pop by +and do the task. Once the task is completed or removed, the geo-fenced +notification should be removed. + +### Turning on house lights + +A ‘smart home’ application may be able to control the user’s house +lights over the internet. If the vehicle is heading towards the user’s +house, the app should be able to detect this and set turn the lights on +over the internet to greet the user when they get home. + +### Temporary loss of GPS signal + +When going through a tunnel, for example, the vehicle may lose sight of +GPS satellites and no longer have a GPS fix. The system must continue to +provide an estimated vehicle location to apps, with suitably increasing +error bounds, if that is possible without reference to mapping data. + +### Navigation- and sensor-driven refinement of geolocation + +The location reported by the geolocation APIs may be refined by input +from the navigation system or sensor system, such as snapping the +location to the nearest road, or supplementing it with dead-reckoning +data based on the vehicle’s velocity history. + +### Application-driven refinement of geocoding + +If the user installs an application bundle from a new restaurant chain +(‘Hamburger Co’, who are new enough that their restaurants are not in +commercial mapping datasets yet), and wants to search for such a +restaurant in a particular place (London), they may enter ‘Hamburger Co, +London’. The application bundle should expose its restaurant locations +as a general [point of interest stream], +and the geocoding system should query that in addition to its other sources. + +The user might find the results from a particular application +consistently irrelevant or uninteresting, so might want to disable +querying that particular application — but still keep the application +installed to use its other functionality. + +#### Excessive results from application-driven refinement of geocoding + +A badly written application bundle which exposes a general point of +interest stream might return an excessive number of results for a query +— either results which are not relevant to the current geographic +area, or too many results to reasonably display on the current map. + +### Malware application bundle + +A malicious developer might produce a malware application bundle which, +when installed, tracks the user’s vehicle to work out opportune times to +steal it. This should not be possible. + +### Traffic rule application + +The user is travelling across several countries in Europe, and finds it +difficult to remember all the road signs, national speed limits and +traffic rules in use in the countries. They have installed an +application which reminds them of these whenever they cross a national +border. + +### Installing a navigation application + +The user wishes to try out a third-party navigation application from the +Apertis store, which is different to the system-integrator-provided +navigation application which came with their vehicle. They install the +application, and want it to be used as the default handler for +navigation requests from now on. + +#### Third-party navigation-driven refinement of geolocation + +The third-party application has some advanced dead-reckoning technology +for estimating the vehicle’s position, which was what motivated the user +to install it. The user wants this refinement to feed into the +geolocation information available to all the applications which are +installed. + +#### Third-party navigation application backend + +If the user installs a full-featured third-party navigation application, +they may want to use it to provide all geo-functionality in the system. + +### No navigation application + +A driver prefers to use paper-based maps to navigate, and has purchased +a non-premium vehicle which comes without a built-in navigation +application bundle, and has not purchased any navigation bundles +subsequently (i.e. the system has no navigation application bundle +installed). + +The rest of the system must still work, and any APIs which handle route +lists should return an error code — but any APIs which handle the +horizon should still include all other useful horizon data. + +### Web-based backend + +An OEM may wish to use (for example) Google’s web APIs for geo-services +in their implementation of the system, rather than using services +provided by a commercial navigation application. This introduces latency +into a lot of the geo-service requests. + +### Navigation route guidance information + +A restaurant application is running on one screen while the driver +follows a route in their navigation application on another screen. The +passenger is using the restaurant application to find and book a place +to eat later on in the journey, and wants to see a map of all the +restaurants nearby to the vehicle’s planned route, plus the vehicle’s +current position, route estimates (such as time to the destination and +time elapsed), and the vehicle’s current position so they can work out +the best restaurant to choose. (This is often known as route guidance or +driver assistance information.) + +While the passenger is choosing a restaurant, the driver decides to +change their destination, or chooses an alternative route to avoid bad +traffic; the passenger wants the restaurant application to update to +show the new route. + +### 2.5D or 3D map widget + +A weather application would like to give a perspective view over a large +area of the country which the vehicle’s route will take it through, +showing the predicted weather in that area for the next few hours. It +would like to give more emphasis to the weather nearby rather than +further away, hence the need for perspective (i.e. a 2.5D or 3D view). + +### Separate route guidance UI + +An OEM wishes to split their navigation application in two parts: the +navigation application core, which is used to find destinations and +waypoints and to plan a route (including implementation of calculating +the route, tracking progress through the journey, and recalculating in +case of bad traffic, for example); and a guidance UI, which is always +visible, and is potentially rendered as part of the system UI. The +guidance UI needs to display the route, plus points of interest provided +by other applications, such as restaurants nearby. It also needs to +display status information about the vehicle, such as the amount of fuel +left, the elapsed journey time, and route guidance. + +Explicitly, the OEM does *not* want the navigation application core to +display points of interest while the user is planning their journey, as +that would be distracting. + +### User control over applications + +The user has installed a variety of applications which expose data to +the geo-services on the system, including points of interest and +waypoint recommendations for routes. After a while, the user starts to +find the behaviour of a fuel station application annoying, and while +they want to continue to use it to find fuel stations, they do not want +it to be able to add waypoints into their routes for fuel station stops. + +## Non-use-cases + +The following use cases are not in scope to be handled by this design — +but they may be in scope to be handled by other components of Apertis. +Further details are given in each subsection below. + +### POI data + +Use cases around handling of points of interest is covered by the +[Points of interest design], which is orthogonal to the geo-APIs described +here. This includes searching for points of interest nearby, displaying +points of interest while driving past them, adding points of interest +into a navigation route, and looking up information about points of +interest. It includes requests from the navigation application or +guidance UI to the points of interest service, and the permissions +system for the user to decide which points of interest should be allowed +to appear in the navigation application (or in other applications). + +### Beacons + +The iOS Location and Maps API supports advertising a device’s +[location][iOS-beacon] using a low-power beacon, such as Bluetooth. This is not a +design goal for Apertis at all, as advertising the location of a fast +vehicle needs a different physical layer approach than Beacons, which +are designed for low-speed devices carried by people. + +### Loading map tiles from a backend + +There is no use case for implementing 2D map rendering via backends and +(for example) loading map tiles *from a backend in the automotive +domain*. 2D map rendering can be done entirely in the IVI domain using a +single libchamplain tile source. At this time, the automotive domain +will not carry 2D map tile data. + +This may change in future iterations of this document to, for example, +allow loading pre-rendered map tiles or satellite imagery from the +automotive domain. + +### SDK APIs for third-party navigation applications + +Implementing a navigation application is complex, and there are many +approaches to it in terms of the algorithms used. In order to avoid +implementing a lot of this complexity, and maintaining it as a stable +API, the Apertis platform explicitly does not want to provide geo-APIs +which are only useful for implementing third-party navigation +applications. + +Third parties may write navigation applications, but the majority of +their implementation should be internal; Apertis will not provide SDK +APIs for routing, for example. + +## Requirements + +### Geolocation API + +Geolocation using GPS must be supported. Uncertainty bounds must be +provided for the returned location, including the time at which the +location was measured (in order to support cached locations). The API +must gracefully handle failure to geolocate the vehicle (for example, if +no GPS satellites are available). + +Locations must be provided with position in all three dimensions, speed +and heading information if known. Locations should be provided with the +other two angles of movement, the rate of change of angle of movement in +all three dimensions, and uncertainty bounds and standard deviation for +all measurements if known. + +See [][Relocating the vehicle]. + +### Geolocation service supports signals + +Application bundles must be able to register to receive a signal +whenever the vehicle’s location changes significantly. The bundle should +be able to specify a maximum time between updates and a maximum distance +between updates, either of which may be zero. The bundle should also be +able to specify a *minimum* time between updates, in order to prevent +being overwhelmed by updates. + +See [][Navigating to a location]. + +### Geolocation implements caching + +If an up-to-date location is not known, the geolocation API may return a +cached location with an appropriate time of measurement. + +See [][Relocating the vehicle], [][Tasks nearby]. + +### Geolocation supports backends + +The geolocation implementation must support multiple backend +implementations, with the selection of backend or backends to be used in +a particular distribution of Apertis being a runtime decision. + +The default navigation application (requirement 5.6) must be able to +feed geolocation information into the service as an additional backend. + +See [][Automotive backend], +[][Third-party navigation-driven refinement of geolocation] + +### Navigation routing API + +Launching the navigation application with zero or more waypoints and a +destination must be supported. The navigation application can internally +decide how to handle the new coordinates — whether to replace the +current route, or supplement it. The application may query the user +about this. + +The interface for launching the navigation application must also support +‘way-areas’, or be extensible to support them in future. It must support +setting some waypoints as to be announced, and some as to be used for +routing but not as announced intermediate destinations. + +It must support setting a destination (or any of the waypoints) as an +address or as a city. + +It must support systems where no navigation application is installed, +returning an error code to the caller. + +It must provide a way to request navigation routing suitable for walking or +cycling, so that the driver knows how to reach a point of interest after they +leave the vehicle. + +See [][Navigating to a location], +[][Changing destination during navigation], +[][Navigating via waypoints], +[][Navigating a tour], +[][Navigating to an entire city], +[][No navigation application]. + +### Navigation routing supports different navigation applications + +The mechanism for launching the navigation application (requirement 5.5) +must allow a third-party navigation application to be set as the +default, handling all requests. + +See [][Installing a navigation application], +[][Third-party navigation-driven refinement of geolocation]. + +### Navigation route list API + +A navigation route list API must be supported, which exposes information +from the navigation application about the current route: the planned +route, including destination, waypoints and way-areas. + +The API must support signalling applications of changes in this +information, for example when the destination is changed or a new route +is calculated to avoid bad traffic. + +If no navigation application is installed, the route list API must +continue to be usable, and must return an error code to callers. + +See [][Navigation route guidance information], +[][No navigation application]. + +### Navigation route guidance API + +A route guidance API must be supported, which allows the navigation +application to expose information about the directions to take next, and +the vehicle’s progress through the current route. It must include: + + - Estimates such as time to destination and time elapsed in the + journey. Equivalently, the journey departure and estimated arrival + times at each destination. + + - Turn-by-turn navigation instructions for the route. + +The API must support data being produced by the navigation application +and consumed by a single system-provided assistance or guidance UI, +which is part of the system UI. It must support being called by the +navigation application when the next turn-by-turn instruction needs to +be presented, or the estimated journey end time changes. + +See [][Navigation route guidance information]. + +### Type-ahead search and address completion supports backends + +The address completion implementation must support multiple backend +implementations, with the selection of backend or backends to be used in +a particular distribution of Apertis being a runtime decision. + +See [][Automotive backend], +[][Type-ahead search and completion for addresses]. + +### Geocoding supports backends + +The geocoding implementation must support multiple backend +implementations, with the selection of backend or backends to be used in +a particular distribution of Apertis being a runtime decision. + +See [][Automotive backend]. + +### SDK has default implementations for all backends + +A free software, default implementation of all geo-functionality must be +provided in the SDK, for use by developers. It may rely on an internet +connection for its functionality. + +The SDK implementation must support all functionality of the geo-APIs in +order to allow app developers to test all functionality used by their +applications. + +See [][SDK backend]. + +### SDK APIs do not vary with backend + +App bundles must not have to be modified in order to switch backends: +the choice of backend should not affect implementation of the APIs +exposed in the SDK to app bundles. + +If a navigation application has been developed by a vendor to use +vendor-specific proprietary APIs to communicate with the automotive +domain, that must be possible; but other applications must not use these +APIs. + +See [][Automotive backend], +[][Custom automotive backend functionality]. + +### Third-party navigation applications can be used as backends + +A third-party or OEM-provided navigation application must, if it +implements the correct interfaces, be able to act as a backend for some +or all geo-functionality. + +See [][Third-party navigation application backend]. + +### Backends operate asynchronously + +As backends for geo-functionality may end up making inter-domain +requests, or may query web services, the interfaces between +applications, the SDK APIs, and the backends must all be asynchronous +and tolerant of latency. + +See the Inter-Domain Communications design. + +See [][Web-based backend]. + +### 2D map rendering API + +Map display has the following requirements: + + - Rendering the map (see [][Relocating the vehicle]). + + - Rendering points of interest, including start and destination points + for navigation. + + - Rendering a path or route. + + - Rendering a polygon or region highlight. + + - The map display must support loading client side map tiles, or + server-provided ones. + + - The map rendering may be done client-side (vector maps) or + pre-computed (raster maps). + + - Rendering custom widgets provided by the application. + + - Optionally rendering the current route list as a map layer. + + - Optionally rendering the vehicle’s current position as a map layer + (see [Relocating the vehicle][Relocating the vehicle]). + +See [][Viewing an address on the map], +[][Adding custom widgets on the map]. + +### 2.5D or 3D map rendering API + +For applications which wish to present a perspective view of a map, a +2.5D or 3D map widget should be provided with all the same features as +the 2D map rendering API. + +See [][2.5D or 3D map widget]. + +### Reverse geocoding API + +Reverse geocoding must be supported, converting an address into zero or +more coordinates. Limiting the search results to coordinates in a radius +around a given reference coordinate pair must be supported. + +Reverse geocoding may work when the vehicle has no internet connection, +but only if that is easy to implement. + +See [][Viewing an address on the map]. + +### Forward geocoding API + +Forward geocoding must be supported for querying addresses at selected +coordinates on a map. Limiting the search results to a certain number of +results should be supported. + +Forward geocoding may work when the vehicle has no internet connection, +but only if that is easy to implement. + +See [][Finding the address of a location on the map]. + +### Type-ahead search and address completion API + +Suggesting and ranking potential completions to a partially entered +address must be supported by the system, with latency suitable for use +in a type-ahead completion system. This should be integrated into a +widget for ease of use by application developers. + +Address completion may work when the vehicle has no internet connection, +but only if that is easy to implement. + +This may need to be integrated with other keyboard usability systems, +such as typing suggestions and keyboard history. If the functionality +cannot be implemented or the service for it is not available, the system +should provide a normal text entry box to the user. + +See [][Type-ahead search and completion for addresses]. + +### Geofencing API + +Application bundles must be able to define arbitrary regions – either +arbitrary polygons, or points with radii – and request a signal when +entering, exiting, or dwelling in a region. The vehicle is dwelling in a +region if it has been in there for a specified amount of time without +exiting. + +See [][Tasks nearby], [][Turning on house lights]. + +### Geofencing service can wake up applications + +It must be possible for geofencing signals to be delivered even if the +application bundle which registered to receive them is not currently +running. + +See [][Tasks nearby], [][Turning on house lights]. + +### Geofencing API signals on national borders + +The geofencing API should provide a built-in geofence for the national +borders of the current country, which applications may subscribe to +signals about, and be woken up for as normal, if the vehicle crosses the +country’s border. + +See [][Traffic rule application]. + +### Geocoding API must be locale-aware + +The geocoding API must support returning results or taking input, such +as addresses, in a localised form. The localisation must be configurable +so that, for example, the user’s home locale could be used, or the +locale of the country the vehicle is currently in. + +See [][Traffic rule application]. + +### Geolocation provides error bounds + +The geolocation API must provide an error bound for each location +measurement it returns, so calling code knows how accurate that data is +likely to be. + +See [][Temporary loss of GPS signal]. + +### Geolocation implements dead-reckoning + +The geolocation API must implement dead reckoning based on the vehicle’s +previous velocity, to allow a location to be returned even if GPS signal +is lost. This must update the error bounds appropriately +([][Geolocation provides error bounds]). + +See [][Temporary loss of GPS signal]. + +### Geolocation uses navigation and sensor data if available + +If such data is available, the geolocation API may use navigation and +sensor data to improve the accuracy of the location it reports, for +example by snapping the GPS location to the nearest road on the map +using information provided by the navigation application. + +See [][Navigation- and sensor-driven refinement of geolocation]. + +### General points of interest streams are queryable + +The general [point of interest streams][point of interest stream] exposed by +applications must be queryable using a library API. + +The approach of exposing points of interest via the geocoding system, as +results for reverse geocoding requests, was considered and decided against. +Reverse geocoding is designed to turn a location (latitude and longitude) into +information describing the nearest address or place — not to a series of +results describing every point of interest within a certain radius. Doing so +introduces problems with defining the search radius, determining which of the +results is the geocoding result, and eliminating duplicate points of interest. + +See the [Points of interest design]. + +Further requirements and designs specific to how applications expose +such general points of interest streams are covered in the Points of +Interest design. + +See [][Application-driven refinement of geocoding]. + +### Location information requires permissions to access + +There are privacy concerns with allowing bundles access to location +data. The system must be able to restrict access to any data which +identifies the vehicle’s current, past or planned location, unless the +user has explicitly granted a bundle access to it. The system may +differentiate access into coarse-grained and fine-grained, for example +allowing application bundles to request access to location data at the +resolution of a city block, or at the resolution of tens of centimetres. +Note that fine-grained data access must be allowed for geofencing +support, as that essentially allows bundles to evaluate the vehicle’s +current location against arbitrary location queries. + +Application bundles asking for fine-grained location data must be +subjected to closer review when submitted to the Apertis application +store. + +See [][Malware application bundle]. + +**Open question**: What review checks should be performed on application +bundles which request permissions for location data? + +### Rate limiting of general point of interest streams + +When handling general point of interest streams generated by +applications, the system must prevent denial of service attacks from the +applications by limiting the number of points of interest they can feed +to the geolocation and other services, both in the rate at which they +are transferred, and the number present in the system at any time. + +See [][Excessive results from application-driven refinement of geocoding]. + +### Application-provided data requires permissions to create + +The user must be able to enable or disable each application from +providing data to the system geo-services, such as route recommendations +or points of interest, without needing to uninstall that application +(i.e. so they can continue to use other functionality from the +application). + +See [][User control over applications]. + +## Existing geo systems + +This chapter describes the approaches taken by various existing systems +for exposing sensor information to application bundles, because it might +be useful input for Apertis’ decision making. Where available, it also +provides some details of the implementations of features that seem +particularly interesting or relevant. + +### W3C Geolocation API + +The [W3C Geolocation API] is a JavaScript API for exposing the user’s +location to web apps. The API allows apps to query the current location, +and to register for signals of position changes. Information about the +age of location data (to allow for cached locations) is returned. +Information is also provided about the location’s accuracy, heading and +speed. + +### Android platform location API + +The [Android platform location API] is a low-level API for performing +geolocation based on GPS or visible Wi-Fi and cellular networks, and +does not provide geofencing or geocoding features. It allows geolocation +and cached geolocation queries, as well as signals of changes in +location. Its design is highly biased towards making apps energy +efficient so as to maintain mobile battery life. + +### Google Location Services API for Android + +The [Google Location Services API for Android] is a more fully featured +API than the platform location API, supporting geocoding and +geofencing in addition to geolocation. It requires the device to be +connected to the internet to access Google Play services. It hides the +complexity of calculating and tracking the device’s location much more +than the platform location API. + +It allows apps to specify upper and lower bounds on the frequency at +which they want to receive location updates. The location service then +calculates updates at the maximum of the frequencies requested by all +apps, and emits signals at the minimum of this and the app’s requested +upper frequency bound. + +It also defines the permissions required for accessing location data +more stringently, allowing coarse- and fine-grained access. + +### iOS Location and Maps API + +The [iOS Location Services and Maps API] is available on both iOS and OS X. +It supports many features: geolocation, geofencing, forward +and reverse geocoding, navigation routing, and local search. + +For geolocation, it supports querying the location and location change +signals, including signals to apps which are running in the background. + +Its geofencing support is for points and radii, and supports entry and +exit signals but not dwell signals. Instead, it supports hysteresis +based on distance from the region boundary. + +Geocoding uses a network service; both forward and reverse geocoding are +supported. + +The [MapKit API] provides an embeddable map renderer and widget, +including annotation and overlay support. + +iOS (but not OS X) supports using arbitrary apps as routing providers +for rendering [turn-by-turn navigation instructions][iOS-turn-by-turn]. +An app which supports this must declare which geographic regions it supports routing +within (for example, a subway navigation app for New York would declare +that region only), and must accept routing requests as a URI handler. +The URIs specify the start and destination points of the navigation +request. + +It also supports navigation routing using a system provider, which +requires a network connection. Calculated routes include metadata such +as distance, expected travel time, localised advisory notices; and the +set of steps for the navigation. It supports returning multiple route +options for a given navigation. + +The [local search API][iOS-local-search] differs from the geocoding API +in that it supports *types* of locations, such as ‘coffee’ or ‘fuel’. As with +geocoding, the local search API requires a network connection. + +### GNOME APIs + +GNOME uses several libraries to provide different geo-features. It does +not have a library for navigation routing. + +#### GeoClue + +[GeoClue] is a geolocation service which supports multiple input +backends, such as GPS, cellular network location and Wi-Fi based +geolocation. + +Wi-Fi location uses the [Mozilla Location Service] and requires +network connectivity. + +It supports [geolocation signals][geoclue-signals] +with a minimum distance between signals, but no time-based limiting. It does not support geofencing, but +the developers are interested in implementing it. + +GeoClue’s security model allows permissions to be applied to individual +apps, and location accuracy to be restricted on a per-app basis. +However, this model is currently incomplete and does not query the +system’s trusted computing base (TCB) (see the Security design for +definitions of the TCB and trust). + +#### Geocode-glib + +[Geocode-glib] is a library for forward and reverse geocoding. It +uses the Nominatim API, and is currently [hard-coded to query nominatim.gnome.org]. +It requires network access to perform geocoding. + +The [Nominatim API] does not require an API key (though it does require a contact e-mail address), +but it is highly recommended that anyone using it commercially runs their own Nominatim server. + +geocode-glib is tied to a single Nominatim server, and does not support +multiple backends. + +#### libchamplain + +[libchamplain] is a map rendering library, providing a map widget +which supports annotations and overlays. It supports loading or +rendering map tiles from multiple sources. + +#### NavIt + +[NavIt] is a 3D turn-by-turn navigation system designed for cars. It +provides a GTK+ or SDL interface, audio output using espeak, GPS input +using gpsd, and multiple map rendering backends. It seems to expose some +of its functionality as a shared library (libnavit), but it is unclear +to what extent it could be re-used as a component in an application, +without restructuring work. It may be possible to package it, with +modifications, as a third-party navigation application, or as the basis +of one. + +### Navigation routing systems + +Three alternative routing systems are described briefly below; a full +analysis based on running many trial start and destination routing +problems against them is yet to be done. + +#### GraphHopper + +[GraphHopper] is a routing system written in Java, which is +available as a server or as a library for offline use. It uses +OpenStreetMap data, and is licenced under Apache License 2.0. + +#### OSRM + +[OSRM] is a BSD-licensed C++ routing system, which can be used as a +server or as a library for offline use. It uses OpenStreetMap data. + +#### YOURS + +[YOURS] is an online routing system which provides a web API for +routing using OpenStreetMap data. + +### NavServer + +NavServer is a proprietary middleware navigation solution, which +accesses a core navigation system over D-Bus. It is designed to be used +as a built-in navigation application. + +It is currently unclear as to the extent which NavServer could be used +to feed data in to the SDK APIs (such as geolocation refinements). + +### GENIVI + +GENIVI implements its geo-features and navigation as various components +under the umbrella of the [IVI Navigation project]. +The core APIs it provides are detailed below. + +#### Navigation + +The navigation application is based on NavIt, using OpenStreetMap for +its mapping data. It implements route calculation, turn-by-turn +instructions, and map rendering. + +It is implemented in two parts: a navigation service, which uses +libnavit to expose routing APIs over D-Bus; and the navigation UI, which +uses these APIs. The navigation service is implemented as a set of +plugins for NavIt. + +#### Fuel stop advisor + +The fuel stop advisor is a demo application which consumes information +from the geolocation API and the vehicle API to get the vehicle’s +location and fuel levels, in order to predict when (and where) would be +best to refuel. + +#### POI service + +The points of interest (POI) service is implemented using multiple +’content access module’ (CAM) plugins, each providing points of +interest from a different provider or source. When searching for POIs, +the service sends the query to all CAMs, with a series of attributes and +values to match against them using a set of operators (equal, less than, +greater than, etc.); plus a latitude and longitude to base the query +around. CAMs return their results to the service, which then forwards +them to the client which originally made the search request. + +CAMs may also register categories of POIs which they can provide. These +categories are arranged in a tree, so the user may limit their queries +to certain categories. + +Additionally, POI searches may be conducted against a given route list, +finding points of interest along the route line, instead of around a +single centre point. + +Finally, it supports a ’proximity alert’ feature, where the POI system +will signal a client if the vehicle moves within a given distance of any +matching POI. + +#### Positioning + +The positioning API provides a large amount of positioning data: time, +position in three dimensions, heading, rate of change of position in +three dimensions, rate of change of angle in three dimensions, precision +of position in three dimensions, and standard deviation of position in +three dimensions and heading. + +The service emits signals about position changes at arbitrary times, up +to a maximum frequency of 10Hz. It exposes information about the number +of GPS satellites currently visible, and signals when this changes. + +### Google web APIs + +Google provides various APIs for geo-functionality. They are available +to use subject to a billing scale which varies based on the number of +requests made. + +#### Google Maps Geocoding API + +The [Google Maps geocoding API] is a HTTPS service which provides forward +and reverse geocoding services, and provides results in JSON or +XML format. + +Forward geocoding supports address formats from multiple locales, and +supports filtering results by county, country, administrative region or +postcode. It also supports artificially boosting the importance of +results within a certain bounds box or region, and returning results in +multiple languages. + +Reverse geocoding takes a latitude and longitude, and optionally a +result type which allows the result to be limited to an address, a +street, or country (for example). + +The array of potential results returned by both forward and reverse +geocoding requests include the location’s latitude and longitude, its +address as a formatted string and as components, details about the +address (if it is a postal address) and the type of map feature +identified at those coordinates. + +The service is designed for geocoding complete queries, rather than +partially-entered queries as part of a type-ahead completion system. + +#### Google Places API + +The [Google Places API] supports several different operations: returning +a list of places which match a user-provided search string, +returning details about a place, and auto-completing a user’s search +string based on places it possibly matches. + +The search API supports returning a list of points of interest, and +metadata about them, which are within a given radius of a given latitude +and longitude. + +The details API takes an opaque identifier for a place (which is +understood by Google and by no other services) and returns metadata +about that place. + +The autocompletion API takes a partial search string and returns a list +of potential completions, including the full place name as a string and +as components, the portion which matched the input, and the type of +place it is (such as a road, locality, or political area). + +#### Google Maps Roads API + +The [Google Maps Roads API] provides a snap to roads API, which takes +a list of latitude and longitude points, and which returns the +same list of points, but with each one snapped to the nearest road to +form a likely route which a vehicle might have taken. + +The service can optionally interpolate the result so that it contains +more points to smoothly track the potential route, adding points where +necessary to disambiguate between different options. + +#### Google Maps Geolocation API + +The [Google Maps Geolocation API] provides a way for a mobile device +(any device which can detect mobile phone towers or Wi-Fi access points) +to look up its likely location based on which mobile phone towers and +Wi-Fi access points it can currently see. + +The API takes some details about the device’s mobile phone network and +carrier, plus a list of identifiers for nearby mobile phone towers and +Wi-Fi access points, and the signal strength the device sees for each of +them. It returns a best guess at the device’s location, as a latitude, +longitude and accuracy radius around that point (in metres). + +If the service cannot work out a location for the device, it tries to +geolocate based on the device’s IP address; this will always return a +result, but the accuracy will be very low. This option may be disabled, +in which case the service will return an error on failure to work out +the device’s location. + +## Approach + +Based on the [][Existing geo systems]) and [][Requirements], we +recommend the following approach for integrating geo-features into +Apertis. The overall summary is to use existing freedesktop.org and +GNOME components for all geo-features, adding features to them where +necessary, and adding support for multiple backends to support +implementations in the automotive domain or provided by a navigation +application. + +### Backends + +Each of the geo-APIs described in the following sections will support +multiple backends. These backends must be choosable at runtime so that a +newly installed navigation application can be used to provide +functionality for a backend. Switching backends may require the vehicle +to be restarted, so that the system can avoid the complexities of +transferring state between the old backend and the new one, such as +route information or GPS location history. + +Applications must not be able to choose which backend is being used for +a particular geo-function — that is set as a system preference, either +chosen by the user or fixed by the system integrator. + +If there are particular situations where it is felt that the application +developer knows better than the system integrator about the backend to +use, that signals a use case which has not been considered, and might be +best handled by a revision of this design and potentially introducing a +new SDK API to expose the backend functionality desired by the +application developer. + +Backends may be implemented in the IVI domain (for example, the default +backends in the SDK must be implemented in the IVI domain, as the SDK +has no other domains), or in the automotive domain. If a backend is +implemented in the automotive domain, its functionality must be exposed +as a proxy service in the IVI domain, which implements the SDK API. +Communications between this proxy service and the backend in the +automotive domain will be over the inter-domain communications +link. + +> See the Inter-Domain Communications design + +This IPC interface serves as a security boundary for the backend. + +Third-party applications (such as navigation applications) may provide +backends for geo-services as dynamically loaded libraries which are +installed as part of their application bundle. As this allows arbitrary +code to be run in the context of the geo-services (which form security +boundaries for applications, see [][Systemic security]), +the code for these third-party backends must be audited and tested +([][Testing backends]) carefully as part of the app store validation process. + +Due to the potential for inter-domain communications, or for backends +which access web services to provide functionality, the backend and SDK +APIs must be asynchronous and tolerant of latency. + +Backends may expose functionality which is not abstracted by the SDK +APIs. This functionality may be used by applications directly, if they +wish to be tied to that specific backend. As noted above, this may +signal an area where the SDK API could be improved or expanded in +future. + +### Navigation application + +Throughout this design, the phrase *navigation application* should be +taken to mean the navigation application bundle, including its UI, a +potentially separate [][Route guidance ui] and any agents or +backends for [][Backends]. While the navigation application +bundle may provide backends which feed data to a lot of the geo-APIs in +the SDK, it may additionally use a private connection and arbitrary +protocol to communicate between its backends and its UI. Data being +presented in the navigation application UI does not necessarily come +from SDK APIs. See [this non-use-case][SDK APIs for third-party navigation applications]. + +A system might not have a navigation application installed (see +[][Navigation routing API]), in which case all APIs +which depend on it must return suitable error codes. + +### 2D map display + +[libchamplain] should be used for top-down map display. It supports +map rendering, adding markers, points of interest (with explanatory +labels), and start and destination points. Paths, routes, polygons and +region highlights can be rendered as using Clutter API on custom map +layers. + +libchamplain supports pre-rendered tiles from online +(ChamplainNetworkTileSource) or offline (ChamplainFileTileSource) +sources. It supports rendering tiles locally using libmemphis, if +compiled with that support enabled. + +On an integrated system, map data will be available offline on the +vehicle’s file system. On the Apertis SDK, an internet connection is +always assumed to be available, so map tiles may be used from online +sources. libchamplain supports both. + +libchamplain supports rendering custom widgets provided by the +application on top of the map layer. + +#### Route list layer on the 2D map + +A new libchamplain layer should be provided by the Apertis SDK which +renders the vehicle’s current route list as an overlay on the map, +updating it as necessary if the route is changed. If no route is set, +the layer must display nothing (i.e. be transparent). + +Applications can add this layer to an instance of libchamplain in order +to easily get this functionality. + +In order for this to work, the application must have permission to query +the vehicle’s route list. + +#### Vehicle location layer on the 2D map + +A new libchamplain layer should be provided by the Apertis SDK which +renders the vehicle’s current location as a point on the map using a +standard icon or indicator. The location should be updated as the +vehicle moves. If the vehicle’s location is not known, the layer must +display nothing (i.e. be transparent). + +Applications can add this layer to an instance of libchamplain in order +to easily get this functionality. + +In order for this to work, the application must have permission to query +the vehicle’s location. + +### 2.5D or 3D map display + +Our initial suggestion is to use [NavIt] for 3D map display. +However, we are currently unsure of the extent to which it can be used +as a library, so we cannot yet recommend an API for 2.5D or 3D map +display. Similarly, we are unsure of the extent to which route +information and custom rendering can be input to the library to +integrate it with other routing engines; or whether it always has to use +routes calculated by other parts of NavIt. + +**Open question**: Is it possible to use NavIt as a stand-alone 2.5D or +3D map widget? + +### Geolocation + +[GeoClue] should be used for geolocation. It supports multiple +backends, so closed source as well as open source backends can be used. +Some of the advanced features which do not impact on the API could be +implemented in an automotive backend, although other backends would +benefit if they were implemented in the core of GeoClue instead. For +example, cached locations and dead-reckoning of the location based on +previous velocity for when GPS signal is lost. + +> The vehicle’s velocity may be queried from the sensors; see the +> Sensors and Actuators design + +GeoClue supports geolocation using GPS (from a modem), 3G and Wi-Fi. It +supports [accuracy bounds for locations][geoclue-accuracy-bounds], +but does not pair that with information about the time of measurement. That would need to be +added as a new feature in the API. Speed, altitude and bearing +information [are supported][geoclue-location-props]. +The other two angles of movement, the rate of change of angle of movement in all three dimensions, and +uncertainty bounds and standard deviation for non-position measurements +are not currently included in the API, and should be added. + +The API already supports signalling of failure to geolocate the vehicle, +by setting its Location property to ‘/’ (rather than a valid +org.freedesktop.GeoClue2.Location object path). + +If the navigation application implements a snap-to-road feature, it +should be used as a further source of input to GeoClue for refining the +location. + +#### Geolocation signals + +GeoClue emits a [LocationUpdated signal][geoclue-location-signal] +whenever the vehicle’s location changes more than the [DistanceThreshold][geoclue-distance-threshold]. +GeoClue currently does not support rate limiting emission of the LocationUpdated +signal for minimum and maximum times between updates. That would need to +be added to the core of GeoClue. + +### Navigation routing + +Navigation routing will be implemented internally by the OEM-provided +navigation application, or potentially by a third-party navigation +application installed by the user. In either case, there will not be an +Apertis SDK API for calculating routes. + +However, the SDK will provide an API to launch the navigation +application and instruct it to calculate a new route. This API is the +[content hand-over] API, where the navigation application can be +launched using a nav URI. The nav URI scheme is a custom scheme (which +must not be confused with the standard [geo URI scheme], which is +for identifying coordinates by latitude and longitude). See [][Appendix: nav URI scheme] +for a definition of the scheme, and examples. + +When handling a content hand-over request, the navigation application +should give the user the option of whether to replace their current +route with the new route — but this behaviour is internal to the +navigation application, and up to its developer and policy. + +The behaviour of the navigation application if it is passed an invalid +URI, or one which it cannot parse (for example, due to not understanding +the address format it uses) is not defined by this specification. + +As per the [content hand-over design], the user may choose a third-party +navigation application as the handler for nav URIs, in which +case it will be launched as the default navigation application. + +If no navigation application is installed, or none is set as the handler +for nav URIs, the error must be handled as per the content hand-over +design. + +### Navigation route list API + +Separately from the content hand-over API for sending a destination to +the navigation application (see [][Navigation routing]), +there should be a navigation route list API to expose information about +the route geometry to SDK applications. + +This API will be provided by the SDK, but implemented by one of many +backends — the navigation application will be one such backend, but +another backend will be available on the SDK to expose mock data for +testing applications against. The mock data will be provided by an +emulator; see [][Testing applications] for more information. + +These backends will feed into a system service which provides the SDK +API to applications, and exposes: + + - The planned route geometry, including destination, waypoints and + way-areas. + + - Potential alternative routes. + +The API will not expose the vehicle’s current position — that’s done by +the [][Geolocation]. Similarly, it will not expose points +of interest or other horizon data — that’s done by the +[points of interest API][Points of interest design]; or guidance information — +that’s done by the [][Navigation route guidance API]). + +The API should emit signals on changes of any of this information, using +the standard org.freedesktop.DBus.Properties signals. We recommend the +following API, exposed at the well-known name +org.apertis.Navigation1: + +--- +/* This object should implement the org.freedesktop.DBus.ObjectManager standard + API to expose all existing routes to clients. +*/ +object /org/apertis/Navigation1/Routes { + read-only i property CurrentRoute; /* index into routes, or negative for ‘no current route’ */ + read-only ao property Routes; /* paths of objects representing potential routes, with the most highly recommended ones first */ +} + +/* One of these objects is created for each potential route. + Each object path is suffixed by a meaningless ID to ensure its uniqueness. + Route objects are immutable once published. Any change should be done by + adding a new route and removing the old one. +*/ +object /org/apertis/Navigation1/Routes/$ID { + read-only a(dd) property Geometry; /* array of latitude–longitude pairs in order, from the start point to the destination (inclusive) */ + read-only u property TotalDistance; /* in metres */ + read-only u property TotalTime; /* in seconds */ + read-only a{ss} property Title; /* mapping from language name to a human-readable title of the route in this language. Language names are using the POSIX locale format with locale identifiers defined by ISO 15897, like fr_BE for example. */ + read-only a{ua{ss}} property GeometryDescriptions; /* array of pairs of index and description, which attach a description, in various languages, to points in the geometry property. Language names use the same format as in the Title property. */ +} +--- + +> Syntax is a pseudo-IDL and types are as defined in the D-Bus +> specification, <http://dbus.freedesktop.org/doc/dbus-specification.html#type-system> + +#### Navigation route list backends + +The backend for the route list service is provided by the navigation +application bundle, which may be built-in or provided by a +user-installed bundle. If no navigation bundle is installed, no backend +is used, and the route list service must return an error when called. + +On the Apertis SDK, a navigation bundle is not available, so a mock +backend should be written which presents route lists provided by the +developer. This will allow applications to be tested using route lists +of the developer’s choice. + +### Navigation route guidance progress API + +There should be a navigation route guidance API to expose information +about progress on the current route. + +This API will be provided as interfaces by the SDK, but implemented by +the navigation application. The UI responsible for displaying progress +information or third party applications can use this API to fetch the +info from the navigation application. + +The route guidance progress API should be a D-Bus API which exposes +estimates such as time to destination and time elapsed in the +journey. + +The API should emit signals on changes of any of this information, using +the standard `org.freedesktop.DBus.Properties` signals. We recommend the +following API, exposed at the well-known name +`org.apertis.NavigationGuidance1.Progress` on the object +`/org/apertis/NavigationGuidance1/Progress`: + +--- +interface org.apertis.NavigationGuidance1.Progress { + read-only t property StartTime; /* UNIX timestamp, to be interpreted in the local timezone */ + read-only t property EstimatedEndTime; /* UNIX timestamp, to be interpreted in the local timezone */ +} +--- + +Additional properties may be added in future. + +> Syntax is a pseudo-IDL and types are as defined in the D-Bus +> specification, <http://dbus.freedesktop.org/doc/dbus-specification.html\type-system> + +### Navigation route guidance turn-by-turn API + +There should be a navigation route guidance API to allow turn-by-turn guidance +notifications to be presented. + +This API will be provided as interfaces by the SDK, but implemented by a +system component which is responsible for presenting notifications — +this will most likely be the compositor. The navigation application can +then call the TurnByTurn API to display new turn-by-turn driving +instructions. + +The route guidance turn-by-turn API should be a D-Bus API which exposes +a method for presenting the next turn-by-turn navigation instruction. + +We recommend the following API, exposed at the well-known name +`org.apertis.NavigationGuidance1.TurnByTurn` on the object +`/org/apertis/NavigationGuidance1/TurnByTurn`: + +--- +interface org.apertis.NavigationGuidance1.TurnByTurn { + /* See https://people.gnome.org/~mccann/docs/notification-spec/notification-spec-latest.html#command-notify */ + method Notify (in u replaces_id, + in s icon_name, + in s summary, + in s body, + in a{sv} hints, + in i expire_timeout, + out u id) + +/* See https://people.gnome.org/~mccann/docs/notification-spec/notification-spec-latest.html\#command-close-notification */ + + method CloseNotification (in u id) +} +--- + +> Syntax is a pseudo-IDL and types are as defined in the D-Bus +> specification, <http://dbus.freedesktop.org/doc/dbus-specification.html\type-system> + +The design of the turn-by-turn API is based heavily on the +freedesktop.org [Notifications specification], +and could share significant amounts of code with the implementation of normal +(non-guidance-related) notifications. + +#### Route guidance UI + +By using a combination of the navigation route guidance API and the +[points of interest API][Points of interest design], it should be possible for an OEM to +provide a route guidance UI which is separate from their main navigation +application UI, but which provides sufficient information for route +guidance and display of points of interest, as described in [][Separate route guidance UI]. + +There is no need for a separate API for this, and it is expected that if +an OEM wishes to provide a route guidance UI, they can do so as a +component in their navigation application bundle, or as part of the +implementation of the system chrome (depending on their desired user +experience). The only requirement is that only one component on the +system implements the route guidance D-Bus interfaces described above. + +### Horizon API + +The [][Horizon] API is a shared library which +applications are recommended to use when they want to present horizon +data in their UI — a combination of the route list and upcoming points +of interest. The library should query the [][Navigation route list API] +and [points of interest APIs][Points of interest design] +and aggregate the results for display in the application. It is provided as a convenience and does not +implement functionality which applications could not otherwise implement +themselves. The library should be usable by applications and by system +components. + +Any OEM-preferred policy for aggregating points of interest may be +implemented in the horizon library in order to be easily usable by all +applications; but applications may choose to query the SDK route list +and points of interest APIs directly to avoid this aggregation and +implement their own instead. + +To use the horizon API, an application must have permission to query +both the route list API and the points of interest API. + +What applications do with the horizon data once they have received it is +up to the application — they may, for example, put it in a local cache +and store it to prevent re-querying for historical data in future. This +is entirely up to the application developer, and is out of the scope of +this design. + +There was a choice between implementing the horizon API as a library or +as a service. Implementing it as a service would not reduce memory +consumption, as all consumers would likely still have to keep all +horizon data in memory for rendering in their UI. It would not reduce +CPU consumption, as aggregation of horizon data is likely to be simple +(merging points of interest with the same name and location). It would +double the number of IPC hops each piece of information had to make +(changing from producer → consumer, to producer → horizon service → +consumer). As all consumers of horizon data are likely to be interested +in most points of interest, it would not significantly reduce the amount +of data needing to be forwarded between producers and consumers. Hence a +library is a more appropriate solution. + +One potential downside of using a library is that it is harder to +guarantee consistency in the horizon seen by different applications — +the implementation should be careful to be deterministic so that users +see the same horizon in all applications using the library. + +See the following figure for a diagram of the flow of points of interest and route +lists around the system. Key points illustrated are: + + - Producers of points of interest choose which POIs are sent to which + consumers (there is no central POI service), though the horizon + library may perform filtering or aggregation according to system + policy. + + - The route list API is provided by the SDK, but uses one backend out + of zero or more provided by the navigation application bundle or + other bundles. + + - The navigation UI is another application with no special powers; the + arbitrary communications between its UI and backend may carry route + lists, points of interest, or other data, but does not have to use + the formats or APIs defined in the SDK. + + - Applications choose which waypoints to send to via the + [navigation routing API][Navigation routing] to be added into the route — this might + result in the user being prompted ‘do you want to add this waypoint + into your route’, or they might be added unconditionally. For + example, the fuel station application bundle may add a fuel stop + waypoint to the route, which is then exposed in the route list API + if the navigation application accepts it. + + - The navigation UI does not necessarily use information from the + route list API to build its UI (although such information may + contribute). + + - If no navigation bundle is installed, the SDK route list service + still exists, it just returns no data. + + + +### Forward and reverse geocoding + +We recommend that [Geocode-glib] is used as the SDK geocoding API. +Geocode-glib is currently hard-coded to use GNOME’s Nominatim service; +it would need to be modified to support multiple backends, such as an +Apertis-specific [Nominatim server](http://wiki.openstreetmap.org/wiki/Nominatim), +or a geocoding service from the automotive backend. It only needs to support +querying one backend at a time; results do not need to be aggregated. +Backends should be loadable and unloadable at runtime. + +The geocode-glib API supports forward and reverse geocoding, and +supports limiting search results to a given bounding box. + +#### Geocoding backends + +On an integrated system, geocoding services will be provided by the +automotive domain, via inter-domain communications (See the Inter-domain +Communications design). On the Apertis +SDK, an internet connection is always assumed to be available, so +geocoding may be performed using an online Nominatim backend. In both +cases, geocode-glib would form the SDK API used by applications. Another +alternative backend, which could be written and used by OEMs instead of +an automotive backend, would be a [Google Maps geocoding API] +backend which runs in the IVI domain. + +Although it is tempting to do so, points of interest should *not* be fed into +geocode-glib via another new backend, as the semantics of points of interest (a +large number of points spread in a radius around a focal point) do not match +the semantics of reverse geocoding (finding the single most relevant address to +describe a given latitude and longitude). Points of interest should be queried +separately using a library provided by the +[Points of Interest design][Points of interest design]. + +#### Localisation of geocoding + +Nominatim supports exporting localised place names from OpenStreetMap, +but geocode-glib does not currently expose that data in its query +results. It would need to be modified to explicitly expose locale data +about results. + +It does currently support supplying the locale of input queries using +the language parameter to [geocode_forward_new_for_params]. + +### Address completion + +Address completion is a complex topic, both because it requires being +able to parse partially-complete addresses, and because the datasets +required to answer completion queries are large. + +Nominatim does not provide dedicated address completion services, but it +is possible to implement them in a separate web service using a filtered +version of the OpenStreetMap database data. An example is available as +[Photon]. Google also provides a paid-for web service for address +completion: the [Google Places Web API]. + +As address completion is a special form of forward geocoding (i.e. +forward geocoding operating on partial input), it should be provided as +part of the geocoding service, and by the same backends which provide +the geocoding functionality. + +If Nominatim (via geocode-glib) is found to be insufficient for address +completion in the SDK, an Apertis-hosted Photon instance could be set +up, and a Photon backend added to the geocoding service. + +In target devices, address completion should be provided by the +automotive backend, or not provided at all if the backend does not +implement it. + +The address completion API should be an extension to the existing +geocode-glib API for forward geocoding. There must be a way for it to +signal there are no known results. Results should be ranked by relevance +or likelihood of a match, and should include information about which +part of the search term caused the match (if available), to allow that +to be highlighted in the widget. + +A separate library (which has a dependency on the SDK widget library) +should provide a text entry widget which implements address completion +using the API on the geocoding service, so that application developers +do not have to reimplement it themselves. This could be similar to +[GtkSearchEntry] \(but using the Apertis UI toolkit). + +#### Address completion backends + +As the backends for the address completion service (i.e. the geocoding +backends) may access sensitive data to answer queries, they must be able +to check the permissions of the application which originated the query. +If the application does not have appropriate permissions, they must not +return sensitive results. + +For example, a backend could be added which resolves ‘home’ to the +user’s home address, ‘work’ to their work address, and ‘here’ to the +vehicle’s current location. In order to maintain the confidentiality of +this data, applications must have permission to access the system +address book (containing the home and work addresses), and permission to +access the vehicle’s location (see [][Location security]). +If the application does not have appropriate permissions, the backend must not return +results for those queries. + +As normal geocoding operation is not sensitive (the results do not +differ depending on who’s submitting a query), backends which require +permissions like this must be implemented in a separate security domain, +i.e. as a separate process which communicates with geocode-glib via +D-Bus. They can get the requesting application’s unforgeable identifier +from D-Bus requests in order to check permissions. + +### Geofencing + +We recommend that GeoClue is modified upstream to implement geofencing +of arbitrary regions, meaning that geofencing becomes part of +[][Geolocation]. Signals on entering or exiting a +geofenced area should be emitted as a D-Bus signal which the application +subscribes to. Delivery of signals to bundles which are not currently +running may cause activation of that application. + +> This is intended to be provided by a system service for activation +> of applications based on subscribed signals; the design is tracked +> in <https://phabricator.apertis.org/T640>. + +The geofencing service should include a database of country borders, and +provide a convenience API for adding a geofence for a particular country +(or, for example, the current country and its neighbours). This would +load the polygon for the country’s border and add it as a geofence as +normal. + +The geofencing service must be available as the well-known name +org.freedesktop.GeoClue2.Geofencing on D-Bus. It must export the +following object: + +--- +/org/freedesktop/GeoClue2/Geofencing/Manager { + /* Adds a geofence as a latitude, longitude and radius (in metres), and + * returns the ID of that geofence. The dwell\_time (in seconds) gives + * how long the vehicle must dwell inside the geofence to be signalled + * as such by the GeofenceActivity signal. */ + method AddGeofenceByRadius(in (dd) location, in u radius, in u dwell_time, out u id) + + /* Adds a geofence by taking an ordered list of latitude and longitude + * points which form a polygon for the genfence’s boundary. */ + method AddGeofenceByPolygon(in a(dd) points, in u dwell\_time, out u id) + + /* Remove some geofences. */ + method RemoveGeofences(in au ids) + + /* Return a (potentially empty) list of the IDs of the geofences the + * vehicle is currently dwelling in. */ + method GetDwellingGeofences(out au ids) + + /* Signal emitted every time a geofence is entered or exited, which + * lists the IDs of all geofences entered and exited since the previous + * signal emission, plus all the geofences the vehicle is currently + * dwelling inside. */ + signal GeofenceActivity(out au entered, out au dwelling, out au exited) +} +--- + +IDs are global and opaque — applications cannot find the area referenced +by a particular geofence ID. A geofence may only be removed by the +application which added it. Currently, the GeofenceActivity signal is +received by all applications, but they cannot dereference the opaque +geofence identifiers for other applications. In future, if +application-level containerisation is implemented, this signal will only +be filtered per application. + +We have informally discussed the possibility of adding geofencing with +the GeoClue developers, and they are in favour of the idea. + +### Location security + +libchamplain is a rendering library, and does not give access to +sensitive information. + +geocode-glib is a thin interface to a web service, and does not give +access to sensitive information. All web service requests must be +secured with best web security practices, such as correct use of HTTPS, +and sending a minimum of identifiable information to the web service. + +GeoClue provides access to sensitive information about the vehicle’s +location. It currently allows limiting the accuracy provided to +applications as [specified by the user](http://www.freedesktop.org/software/geoclue/docs/gdbus-org.freedesktop.GeoClue2.Client.html#gdbus-property-org-freedesktop-GeoClue2-Client.RequestedAccuracyLevel); +this could be extended to implement a policy determined by the capabilities requested in the +application’s manifest. + +Similarly for GeoClue’s geofencing feature, when it is it added — +clients have separated access to its D-Bus API to allow them to be +signalled at different accuracies and rates. This applies to navigation +routing as well, as it may provide feedback to applications about +progress along a route, which exposes information about the vehicle’s +location. + +Application bundles asking for fine-grained location data should be +subjected to closer review when submitted to the Apertis application +store. + +### Systemic security + +As the geo-features can source information from application bundles, +they form part of the security boundary around application bundles. + +In order to avoid denial of service attacks from an application bundle +which emits too much data as, for example, a general point of interest +stream, the system should rate limit such streams in time (number of +POIs per unit time) and space (number of POIs per map area). + +Location updates emitted by GeoClue must be rate limited between the +minimum and maximum distance and time limits set by each client. These +limits must be checked to ensure that a client is not requesting updates +too frequently. + +For the components in the system which provide access to sensitive +information (the vehicle’s location), a security boundary needs to be +defined between them and application bundles. Geolocation, navigation +route lists, route guidance and geofencing are the sensitive APIs — +these are all implemented as services, so the D-Bus APIs for those +services form the security boundary. In order to use any of these +services, an application must have the appropriate permission in its +manifest. Address completion as a whole is not treated as sensitive, but +some of its backends may be sensitive, and perform their own checks +according to other permissions (which may include those below). The +following permissions are suggested: + + - *content-hand-over*: Required to use the content hand-over API for + setting a new [navigation route][Navigation routing]. + + - *location*: Required to access the geolocation, geofencing, or + navigation route list services. + + - *navigation-route*: Required to access the navigation route list + services (note that the *location* permission is also required). + + - *navigation-guidance*: Required to access the navigation guidance + services (note that the *location* permission is also required). + +The libchamplain layers which applications can use (see +[here][Route list layer on the 2D map] and [here][Vehicle location layer on the 2D map]) +are treated as application code which is using the relevant +services (geolocation and navigation route guidance), and hence the +*location* or *location* and *navigation-route* permissions are required +to use them. Similarly for the horizon library. + +Any service which accepts data from applications, such as points of +interest or waypoints to add into the route list, must check that the +user has not disabled that application from providing data. If the user +has disabled it, the data must be ignored and an error code returned to +the application if the API allows it, to indicate to the application +that sharing was prevented. + +### Testability + +There are several components to testability of geo-functionality. Each +of the components of this system need to be testable as they are +developed, and as new backends are developed for them (including testing +the backends themselves). Separately, application developers need to be +able to test their applications’ behaviour in a variety of simulated +geo-situations. + +#### Testing geo-functionality + +Each of the services must have unit tests implemented. If the service +has backends, a mock backend should be written which exposes the backend +API over D-Bus, and integration tests for that service can then feed +mock data in via D-Bus and check that the core of the service behaves +correctly. + +> Just like how libfolks’ dummy backend is used in its unit tests, +> <https://git.gnome.org/browse/folks/tree/backends/dummy> + +#### Testing backends + +Each service which has backends must implement a basic compliance test +suite for its backends, which will load a specified backend and check +that its public API behaves as expected. The default backends will +further be tested as part of the integration tests for the entire +operating system. + +#### Testing applications + +In order to allow application developers to test their applications with +mock data from the geo-services, there must be an emulator program +available in the SDK which uses the mock backend for each service to +feed mock data to the application for testing. + +For example, the emulator program could display a map and allow the +developer to select the vehicle’s current location and the accuracy of +that location. It would then feed this data to the mock backend of the +geolocation service, which would pass it to the core of the geolocation +service as if it were the vehicle’s real location. This would then be +passed to the application under test as the vehicle’s location, by the +SDK geolocation API. + +The emulator could additionally allow drawing a route on the map, which +it would then send to the mock backend for the route list API as the +current route — this would then be passed to the application under test +as the vehicle’s current route. + +This means that the API of the mock backend for each service must be +stable and well defined. + +### Requirements + +This design fulfils the following requirements: + + - [][Geolocation API] + — use GeoClue + + - [][Geolocation service supports signals] + — use GeoClue; augment its signals + + - [][Geolocation implements caching] + — to be added to GeoClue + + - [][Geolocation supports backends] + — GeoClue supports backends + + - [][Navigation routing API] + — use content hand-over design, passing a nav URI to the navigation application + + - [][Navigation routing supports different navigation applications] + — content hand-over supports setting different applications as the + handlers for the nav URI scheme + + - [][Navigation route list API] + — new D-Bus API which is implemented by the navigation application backend + + - [][Navigation route guidance API] + — new D-Bus API implemented by the system UI (i.e. the compositor) which is called by the + navigation application + + - [][Type-ahead search and address completion supports backends] + — implemented as part of geocoding, to be added to geocode-glib + + - [][Geocoding supports backends] + — to be added to geocode-glib + + - [SDK has default implementations for all backends][SDK has default implementations for all backends] + — Gypsy or a mock backend for geolocation; custom online Nominatim server for + geocoding; online OpenStreetMap for 2D maps; libnavit for 3D maps, + **subject to further evaluation**; custom mock backend for + navigation route list; custom online Nominatim or Photon server for + address completion + + - [][SDK APIs do not vary with backend] + — GeoClue API for geolocation; geocode-glib for geocoding; libchamplain for 2D maps; + libnavit for 3D maps, **subject to further evaluation**; content + hand-over API for navigation routing; new API for navigation route + lists and guidance; new API extensions to geocode-glib for address + completion + + - [][Third-party navigation applications can be used as backends] + — backends are implemented as loadable libraries installed by the + navigation application + + - [][Backends operate asynchronously] + — backends are implemented over D-Bus so are inherently asynchronous + + - [][2D map rendering API] + — use libchamplain with a local or remote tile store + + - [][2.5D or 3D map rendering API] + — use libnavit, **subject to further evaluation** + + - [][Reverse geocoding API] + — use geocode-glib + + - [][Forward geocoding API] + — use geocode-glib + + - [][Type-ahead search and address completion API] + — to be added to geocode-glib + + - [][Geofencing API] + — to be implemented as a new feature in GeoClue + + - [][Geofencing service can wake up applications] + — to be implemented as a new feature in GeoClue + + - [][Geofencing API signals on national borders] + — to be added as a data set in GeoClue + + - [][Geocoding API must be locale aware] + — to be added to geocode-glib to expose existing OpenStreetMap localised data + + - [][Geolocation provides error bounds] + — GeoClue provides accuracy information, but it needs augmenting + + - [][Geolocation implements dead reckoning] + — to be added to GeoClue + + - [][Geolocation uses navigation and sensor data if available] + — to be added as another backend to GeoClue + + - [][General points of interest streams are queryable] — to be + designed and implemented as part of the [points of interest design] + + - [][Location information requires permissions to access] + — to be implemented as manifest permissions for application bundles + + - [][Rate limiting of general point of interest streams] + — security boundary implemented as D-Bus API boundary; rate limiting applied on + signal emission and processed general point of interest streams + + - [][Application provided data requires permissions to create] + — all geo-services must check settings before accepting data from + applications + +### Suggested roadmap + +As the SDK APIs for geo-features are, for the most part, provided by +FOSS components which are available already, the initial deployment of +geo-features requires GeoClue, geocode-glib and libchamplain to be +packaged for the distribution, if they are not already. + +The second phase would require modification of these packages to +implement missing features and implement additional backends. This can +happen once the initial packaging is complete, as the packages fulfil +most of Apertis’ requirements in their current state. This requires the +address completion APIs to be added to to geocode-glib, and the +geofencing APIs to be added to GeoClue, amongst other changes. These API +additions should be prioritised over other work, so that application +development (and documentation about application development) can begin. + +This second phase includes modifying the packages to be +container-friendly, so that they can be used by compartmentalised apps +without leaking sensitive data from one app to another. This requires +further in-depth design work, but should require fairly self-contained +changes. + +The second phase also includes writing the new services, such as the +[][Navigation route list API], the [][Navigation route guidance API] +and the [][Horizon API]. + +## Open questions + +1. What review checks should be performed on application bundles + which request permissions for location data? + +2. Is it possible to use NavIt as a stand-alone 2.5D or 3D map + widget? + +## Summary of recommendations + +As discussed in the above sections the recommendations are: + + - Packaging and using libchamplain for 2D map display. + + - Adding a route list layer for libchamplain. + + - Adding a vehicle location layer for libchamplain. + + - Packaging and using libnavit for 3D map display, **subject to + further investigation**. + + - Packaging and using GeoClue for geolocation. It needs measurement + times, cached location support, dead-reckoning and more measurements + and uncertainty bounds to be added upstream. + + - Adding minimum and maximum update periods for clients to upstream + GeoClue, alongside the existing distance threshold API for location + update signals. + + - Adding a new navigation application backend to GeoClue to implement + snap-to-road refinement of its location. + + - Adding a new mock backend to GeoClue for the SDK. + + - Implementing a library for parsing place and nav URIs and + formalising the specifications so they may be used by third-party + navigation applications. + + - Adding support for place and nav URIs to the content hand-over + service (Didcot) and adding support for a default navigation + application. + + - Implementing a navigation route list service with support for + loadable backends, including a mock backend for the SDK. + + - Implementing a navigation route guidance API in the system + compositor. + + - Implementing a horizon library for aggregating and filtering points + of interest with the route list, suitable for use by applications. + + - Packaging and using geocode-glib for forward and reverse geocoding. + It needs support for exposing localised place names to be added + upstream. + + - Adding support for loadable backends to geocode-glib, + including a mock backend for the SDK. + + - Auditing geocode-glib to ensure it maintains data privacy by, for + example, using TLS for all requests and correctly checking + certificates. + + - Auditing GeoClue to ensure it maintains data privacy by, for + example, using TLS for all requests and correctly checking + certificates. + + - Implementing an address completion API in geocode-glib and + implementing it in the Nominatim and mock backends. + + - Implementing an address completion widget based on the address + completion API. + + - Implementing a geofencing API in upstream GeoClue. + + - Integrating the geo-services with the app service proxy to apply + access control rules to whether applications can communicate with it + to retrieve potentially sensitive location or navigation data. Only + permit this if the appropriate permissions have been set on the + application bundle’s manifest. + + - Implementing an emulator program in the SDK for controlling mock + data sent by the SDK geo-APIs to applications under test. + + - Providing integration tests for all geo-functionality. + +## Appendix: Recommendations for third-party navigation applications + +While this design explicitly does not cover providing SDK APIs purely to +be used in implementing navigation applications, various recommendations +have been considered for what navigation applications probably should +do. These are non-normative, provided as suggestions only. + + - Support different types of vehicle — the system could be deployed in + a car, or a motorbike, or a HGV, for example. Different roads are + available for use by these different vehicles. + + - Support calculating routes between the start, waypoints and + destination using public transport, in addition to the road network. + For example, this could be used to provide a comparison against the + car route; or to incorporate public transport schemes such as + park-and-ride into route suggestions. + + - Support audio turn-by-turn navigation, reading out instructions to + the driver as they are approached. This allows the driver to avoid + looking at the IVI screen to see the map, allowing them to focus on + the road. + + - Route guidance must continue even if another application takes the + foreground focus on the IVI system, meaning the guidance system must + be implemented as an agent. + + - Support different optimisation strategies for route planning: + minimal travel time, scenic route, minimal cost (for fuel), etc. + + - In order to match the driver’s view out of the windscreen, the map + displayed when navigating should be a 3D projection to better + emphasise navigational input which is coming up sooner (for example, + the nearest junction or road signs). + + - Support changing the destination mid-journey and calculating a new + route. + + - Support navigating via zero or more waypoints before reaching the + final destination. Support adding new waypoints mid-journey. + + - Provide the driver with up-to-date information about the estimated + travel time and distance left on their route, plus the total elapsed + travel time since starting the route. This allows the driver to work + out when to take rest breaks from driving. + + - Detect when the driver has taken a wrong turning while navigating, + and recalculate the route to bring them back on-route to their + destination, reoptimising the route for minimal travel time (or some + other criterion) from their new location. The new route might be + radically different from the old one if this is more optimal. The + route recalculation should happen quickly (on the order of five + seconds) so that the driver gets updated routing information + quickly. + + - Support recalculating and potentially changing a navigation route if + traffic conditions change and make the original route take + significantly longer than an alternative. There must be some form of + hysteresis on these recalculations so that two routes which have + very similar estimated travel times, but which keep alternating as + the fastest, do not continually replace each other as the suggested + route. The application may ask or inform the driver about route + recalculations, as the driver may be able to assess and predict the + traffic conditions better than the application. + + - Provide information to the driver about the roads the vehicle is + travelling along or heading towards and going to turn on to in + future, such as the road name and number, and major towns or cities + the road leads to. This allows the driver to match up the + turn-by-turn navigation directions with on-road signage. (This is + often known as route guidance or driver assistance information.) + + - If the driver takes a rest break during a navigation route, and + turns the vehicle off, the application must give the driver the + option to resume the navigation route when the vehicle is turned on + again. The route must be recalculated from the vehicle’s current + location to ensure the resumed route is still optimal for current + traffic conditions. + + - Support cancelling the navigation when the vehicle is turned on + again, at which point all navigation and turn-by-turn directions + stop. + + - If driven abroad, the navigation application should provide + locale-sensitive navigation information, such as speed limits in the + local units, and road descriptions which match the local signage + conventions. + + - Support feeding geolocation data in to the system geolocation + service, if such data may be more precise than the raw GPS + positions; for example, if it can be snapped to the nearest road. + + - Support feeding other geo-information to the other geo-services, + such as answering geocoding queries or performing geo-fencing. + Support being a full replacement for the inbuilt navigation + application and all the SDK services it provides. + + - Query the system POI API for restaurants and toilets at times and + frequencies suitable for recommending food or toilet breaks to the + driver appropriately. Allow the driver to dismiss or disable these, + or to change the intervals. Do not recommend a break if the journey + is predicted to end soon. Launch the application which provided the + POI if the user clicks on a POI in the navigation application, so + that they can perform further actions on that POI (for example, if + it’s a restaurant, they could reserve a table). When POIs are + displayed in the navigation application, they can be rendered as + desired by the navigation application developer; when they are + displayed in other applications, they are rendered as desired by + those applications’ developers. + +## Appendix: place URI scheme + +The place URI scheme is a non-standard URI scheme suggested to be used +within Apertis for identifying a particular place, including a variety +of redundant forms of information about the place, rather than relying +on a single piece of information such as a latitude and longitude. To +refer to a location by latitude and longitude *only*, use the standard +[geo URI scheme]. + +A place URI is a non-empty human-readable string describing the +location, followed by zero or more key–value pairs providing additional +information. The key–value pairs are provided as parameters as defined +in [RFC 5870], +i.e. each parameter is separated by a semicolon, keys +and values are separated by an equals sign, and percent-encoding is used +to encode reserved characters. + +The location string must be of the format `1\*paramchar`, as defined in +RFC 5870. All non-ASCII characters in the string must be +[percent-encoded][RFC5870-percent-escape], +and implementations must interpret the decoded +string as [UTF-8]. + +Implementations may support the following parameters, and must ignore +unrecognised parameters, as more may be added in future. All non-ASCII +characters in parameter keys and values must be percent-encoded, and +implementations must interpret the decoded strings as UTF-8. The +semicolon and equals sign separators must not be percent-encoded. The +ordering of parameters does not matter, unless otherwise specified. Each +parameter may appear zero or one times, unless otherwise specified. + + - `location`: the latitude and longitude of the place, as a geo URI + (*with* the `geo:` scheme prefix) + + - `postal-code`: the postal code for the place, in the country’s postal + code format + + - `country`: the [ISO 3166-1 alpha-2] country code + + - `region`: the human-readable name of a large administrative region of + the country, such as a state, province or county + + - `locality`: the human-readable name of the locality, such as a town or + city + + - `area`: the human-readable name of an area smaller than the locality, + such as a neighbourhood, suburb or village + + - `street`: the human-readable street name for the place + + - `building`: the human-readable name or number of a house or building + within the `street` + + - `formatted-address`: a human-readable formatted version of the entire + address, intended to be displayed in the UI rather than + machine-parsed; implementations may omit this if it is identical to + the location string, but it will often be longer to represent the + location unambiguously (the location string may be ambiguous or + incomplete as it is typically user input) + +### Examples + +This section is non-normative. Each example is given as a fully encoded +string, followed by it split up into its un-encoded components. + +- `place:Paris` + - Location string: Paris + - No parameters + +- `place:Paris;location=geo%3A48.8567%2C2.3508;country=FR;formatted-address=Paris%2C%20France` + - Location string: Paris + - Parameters: + - `location`: `geo:48.8567,2.3508` + - `country`: FR + - `formatted-address`: Paris, France + +- `place:K%C3%B6nigsstieg%20104%2C%2037081%20G%C3%B6ttingen;location=geo%3A51.540060%2C9.911850;country=DE;locality=G%C3%B6ttingen;postal-code=37081;street=K%C3%B6nigsstieg;building=104;formatted-address=K%C3%B6nigsstieg%20104%2C%2037081%20G%C3%B6ttingen%2C%20Germany` + - Location string: Königsstieg 104, 37081 Göttingen + - Parameters: + - `location`: `geo:51.540060,9.911850` + - `country`: DE + - `locality`: Göttingen + - `postal-code`: 37081 + - `street`: Königsstieg + - `building`: 104 + - `formatted-address`: Königsstieg 104, 37081 Göttingen, Germany + +- `place:CN Tower;location=geo%3A43.6426%2C-79.3871;formatted-address=301%20Front%20St%20W%2C%20Toronto%2C%20ON%20M5V%202T6%2C%20Canada` + - Location string: CN Tower + - Parameters: + - `location`: `geo:43.6426,-79.3871` + - `formatted-address`: 301 Front St W, Toronto, ON M5V 2T6, Canada + +## Appendix: nav URI scheme + +The nav URI scheme is a non-standard URI scheme suggested to be used +within Apertis for identifying a navigation route, including its +destination, intermediate destinations (waypoints) and points or areas +to route via but which are not named destinations (via-points). Each +point or area may be provided as a place or a location. + +A nav URI is a non-empty destination place, followed by zero or more +key–value pairs providing additional information. The key–value pairs +are provided as parameters as defined in [RFC 5870], +i.e. each parameter is separated by a semicolon, keys and values are separated by +an equals sign, and percent-encoding is used to encode reserved +characters. + +The destination place must be provided as [][Appendix: place URI scheme] +(*with* the `place:` URI prefix), or as a geo URI (*with* the `geo:` URI prefix); and +must be encoded in the format `1\*paramchar`, as defined in RFC 5870; i.e. +all non-ASCII and reserved characters in the string must be +percent-encoded. + +Implementations may support the following parameters, and must ignore +unrecognised parameters, as more may be added in future. All non-ASCII +characters in parameter keys and values must be percent-encoded, and +implementations must interpret the decoded strings as UTF-8. The +semicolon and equals sign separators must not be percent-encoded. The +ordering of parameters does not matter, unless otherwise specified. Each +parameter may appear zero or one times, unless otherwise specified. + + - `description`: a human-readable description of the route, intended to + be displayed in the UI rather than machine-parsed + + - `way`: a named intermediate destination, as a place URI (*with* the + `place:` scheme prefix) or as a geo URI (*with* the `geo:` scheme + prefix); these parameters are order-dependent (see below) + + - `via`: a non-named intermediate routing point, as a place URI (*with* + the `place:` scheme prefix) or as a geo URI (*with* the `geo:` scheme + prefix); these parameters are order-dependent (see below) + +The `way` and `via` parameters are order-dependent: they will be added to +the route in the order they appear in the nav URI. Way-places and +via-places may be interleaved — they form a single route. The +destination place always forms the final point in this route. The `way` +and `via` parameters may each appear zero or more times. + +Additional routing specific parameters can be added. +If those parameters are not provided, the default value is left to the routing +engine. It may be different for each type of vehicle, or due to other logic in +the routing engine. +Many parameters represent a single value; for example, it is not meaningful to +specify both `optimize=fastest` and `optimize=shortest`. URIs with multiple values +for a single-valued parameter, for example +`place:Home;optimize=fastest;optimize=shortest`, should be treated as though +that parameter was not present. +Apertis does not currently define multi-valued preferences. All values should be +specified at most once. However, OEMs may define and use their own multi-valued +properties. The naming convention is defined below. +Boolean values can be specified as `1` and `0` (or equivalently `true` and +`false` while being case insensitive). Other values are considered invalid. + + - `vehicle`: vehicle for which the route should be calculated. Apertis defines + `car`, `walk`, and `bike`. + `car` routes are optimized for being used by car. + `walk` routes are optimized for walking. + `bicycle` routes are optimized for being ridden by bicycles. + + - `optimize`: Optimizes route calculation towards a set of criteria. Apertis + defines `fastest`, and `shortest` criteria. + `fastest` routes are optimized to minimize travel duration. + `shortest` routes are optimized to minimize travel distance. + + - `avoid-tolls`: Boolean. If true, the calculated route should avoid tolls. + If usage can't be avoided, the handler application is responsible for + informing the user. + + - `avoid-motorways`: Boolean. If true, the generated route should avoid motorways. + If usage can't be avoided, the handler application is responsible for + informing the user. + + - `avoid-ferries`: Boolean. If true, the generated route should avoid ferries. + If usage can't be avoided, the handler application is responsible for + informing the user. + +Additionally, vendor specific parameters can be provided for vendor specific +features. +To avoid contradictory definitions for the same parameter in different +implementations, vendor-specific parameters must be named in the form +`x-vendorname-paramname`, similar to +[the convention for extending `.desktop` files](https://specifications.freedesktop.org/desktop-entry-spec/desktop-entry-spec-latest.html#extending). +Note that unlike keys in `.desktop` files, parameter names in `nav:` URIs are +not case-sensitive: consistently using lower-case is encouraged. +For instance, one of the [examples] below assumes that a manufacturer has +introduced a boolean option `x-batmobile-avoid-detection`, and a string-valued +parameter `x-batmobile-auto-pilot-mode`. + +### Examples + +This section is non-normative. Each example is given as a fully encoded +string, followed by it split up into its un-encoded components. + +#### Example 1 +`nav:place%3AKings%2520Cross%2520station%252C%2520London%3Blocality%3DLondon%3Bpostal-code%3DN19AL` +- Destination place: + - Location string: Kings Cross station, London + - Parameters: + - `locality`: London + - `postal-code`: N19AL + +#### Example 2 +`nav:geo%3A51.531621%2C-0.124372` +- Destination place: `geo:51.531621,-0.124372` + +#### Example 3 +`nav:place%3ABullpot%2520Farm%3Blocation%3Dgeo%253A54.227602%252C-2.517940;way=place%3ABirmingham%2520New%2520Street%2520station%3Blocation%3Dgeo%253A52.477620%252C-1.897904;via=place%3AHornby%3Blocation%3Dgeo%253A54.112245%252C-2.636527%253Bu%253D2000;way=place%3AInglesport%252C%2520Ingleton%3Bstreet%3DThe%2520Square%3Bbuilding%3D11%3Blocality%3DIngleton%3Bpostal-code%3DLA63EB%3Bcountry%3DGB` +- Destination place: + - Location string: Bullpot Farm + - Parameters: + - `location`: `geo:54.227602,-2.517940` +- Parameters: + - `way`: + - Location string: Birmingham New Street station + - Parameters: + - `location`: `geo:52.477620,-1.897904` + - `via`: + - Location string: Hornby + - Parameters: + - `location`: `geo:54.112245,-2.636527;u=2000` + - `way`: + - Location string: Inglesport, Ingleton + - Parameters: + - `street`: The Square + - `building`: 11 + - `locality`: Ingleton + - `postal-code`: LA63EB + - `country`: GB + +#### Example 4 +`nav:geo%3A51.531621%2C-0.124372;vehicle=walk;optimize=shortest` +- Destination place: `geo:51.531621,-0.124372` +- Parameters: + - `vehicle`: `walk` + - `optimize`: `shortest` + +#### Example 5 +`nav:geo%3A51.531621%2C-0.124372;vehicle=car;avoid-tolls=false;x-batmobile-auto-pilot-mode=full;x-batmobile-avoid-detection=true` +- Destination place: `geo:51.531621,-0.124372` +- Parameters: + - `vehicle`: `car` + - `avoid-tolls`: `false` + - `x-batmobile-avoid-detection`: `true` + - `x-batmobile-auto-pilot-mode`: `full` + +#### Example 6 +`nav:place:Cambridge;x-myvendor-avoid-road=A14;x-myvendor-avoid-road=M11` +- Destination place: Cambridge +- Parameters: + - `x-myvendor-avoid-road`: multi-valued: A14, M11 + +[automotive domain]: https://wiki.apertis.org/mediawiki/index.php/Glossary#automotive-domain + +[point of interest stream]: https://wiki.apertis.org/Points_of_interest#General_POI_providers + +[Points of interest design]: https://wiki.apertis.org/Points_of_interest + +[iOS-beacon]: https://developer.apple.com/library/ios/documentation/UserExperience/Conceptual/LocationAwarenessPG/RegionMonitoring/RegionMonitoring.html#//apple_ref/doc/uid/TP40009497-CH9-SW1 + +[W3C Geolocation API]: http://www.w3.org/TR/geolocation-API/ + +[Android platform location API]: http://developer.android.com/guide/topics/location/strategies.html + +[Google Location Services API for Android]: http://developer.android.com/training/location/index.html + +[iOS Location Services and Maps API]: #https://developer.apple.com/library/ios/documentation/UserExperience/Conceptual/LocationAwarenessPG/Introduction/Introduction.html#//apple_ref/doc/uid/TP40009497-CH1-SW1 + +[MapKit API]: https://developer.apple.com/library/ios/documentation/UserExperience/Conceptual/LocationAwarenessPG/MapKit/MapKit.html#//apple_ref/doc/uid/TP40009497-CH3-SW1 + +[iOS-turn-by-turn]: https://developer.apple.com/library/ios/documentation/UserExperience/Conceptual/LocationAwarenessPG/ProvidingDirections/ProvidingDirections.html#//apple_ref/doc/uid/TP40009497-CH8-SW5 + +[iOS-local-search]: https://developer.apple.com/library/ios/documentation/UserExperience/Conceptual/LocationAwarenessPG/EnablingSearch/EnablingSearch.html#//apple_ref/doc/uid/TP40009497-CH10-SW1 + +[GeoClue]: http://freedesktop.org/wiki/Software/GeoClue/ + +[Mozilla Location Service]: https://wiki.mozilla.org/CloudServices/Location + +[geoclue-signals]: http://www.freedesktop.org/software/geoclue/docs/gdbus-org.freedesktop.GeoClue2.Client.html#gdbus-signal-org-freedesktop-GeoClue2-Client.LocationUpdated + +[Geocode-glib]: https://developer.gnome.org/geocode-glib/stable/ + +[hard-coded to query nominatim.gnome.org]: https://bugzilla.gnome.org/show_bug.cgi?id=756311 + +[Nominatim API]: http://wiki.openstreetmap.org/wiki/Nominatim + +[libchamplain]: https://wiki.gnome.org/Projects/libchamplain + +[NavIt]: http://www.navit-project.org/ + +[GraphHopper]: https://graphhopper.com/ + +[OSRM]: http://project-osrm.org/ + +[YOURS]: http://wiki.openstreetmap.org/wiki/YOURS + +[IVI Navigation project]: http://projects.genivi.org/ivi-navigation/documentation + +[Google Maps geocoding API]: https://developers.google.com/maps/documentation/geocoding/intro + +[Google Places API]: https://developers.google.com/places/web-service/ + +[Google Maps Roads API]: https://developers.google.com/maps/documentation/roads/intro + +[Google Maps Geolocation API]: https://developers.google.com/maps/documentation/geolocation/intr + +[geoclue-accuracy-bounds]: http://www.freedesktop.org/software/geoclue/docs/gdbus-org.freedesktop.GeoClue2.Location.html#gdbus-property-org-freedesktop-GeoClue2-Location.Accuracy + +[geoclue-location-props]: http://www.freedesktop.org/software/geoclue/docs/gdbus-org.freedesktop.GeoClue2.Location.html + +[geoclue-location-signal]: http://www.freedesktop.org/software/geoclue/docs/gdbus-org.freedesktop.GeoClue2.Client.html#gdbus-signal-org-freedesktop-GeoClue2-Client.LocationUpdated + +[geoclue-distance-threshold]: http://www.freedesktop.org/software/geoclue/docs/gdbus-org.freedesktop.GeoClue2.Client.html#gdbus-property-org-freedesktop-GeoClue2-Client.DistanceThreshold + +[content hand-over]: https://wiki.apertis.org/Content_hand-over + +[geo URI scheme]: http://tools.ietf.org/html/rfc5870 + +[content hand-over design]: https://wiki.apertis.org/Content_hand-over + +[Notifications specification]: https://people.gnome.org/~mccann/docs/notification-spec/notification-spec-latest.html + +[Nominatim server]: http://wiki.openstreetmap.org/wiki/Nominatim + +[geocode_forward_new_for_params]: https://developer.gnome.org/geocode-glib/stable/GeocodeForward.htmlgeocode-forward-new-for-params + +[Photon]: http://photon.komoot.de/ + +[Google Places Web API]: https://developers.google.com/places/web-service/autocomplete + +[GtkSearchEntry]: https://developer.gnome.org/gtk3/stable/GtkSearchEntry.html + +[RFC5870]: http://tools.ietf.org/html/rfc5870#section-3.3 + +[RFC5870-percent-escape]: http://tools.ietf.org/html/rfc5870#section-3.5 + +[UTF-8]: http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf + +[ISO 3166-1 alpha-2]: http://www.iso.org/iso/country_codes.htm diff --git a/content/designs/global-search.md b/content/designs/global-search.md new file mode 100644 index 0000000000000000000000000000000000000000..8865ed6d2e803bcbe22cc7ec9a547df3551c7b60 --- /dev/null +++ b/content/designs/global-search.md @@ -0,0 +1,933 @@ +--- +title: Global search +short-description: Guidelines for implementing a global search system + (unimplemented) +authors: + - name: Derek Foreman +--- + +# Global search + +## Introduction + +Apertis will store several types of information – media files, +documents, contacts, e-mails, applications and their preferences, chat +logs, and more. Much of this content will be stored with the application +that generates or consumes it. A file manager would be very cumbersome +for finding content in all these locations, and some of these data types +are not strictly files. A powerful search system needs to be implemented +to facilitate convenient access to the user's data. + +Not all interesting information is locally stored. Apertis may be +equipped with an internet connection and the user may want their search +to include videos on YouTube or text from Wikipedia. + +If a GPS device is present, search results could potentially include +nearby points of interest like gas stations, coffee shops and museums. + +Compiling and displaying search results from these varied sources is +only a partial solution. The interface should also allow interaction +with the content – by launching an appropriate handling application. + +The goal of this document is to define global search in the context of +Apertis and establish guidelines for implementing an effective global +search system. + +## Information Classification + +### Information Sources + +There are two types of information sources available to Apertis for +searching. + + - Primary – Sources that are used in the generation of search results. + + - Auxiliary – meta-information sources for providing further detail + about a primary search result. + +The source types can be further broken down into three storage +locations: + + - Internal – data stored directly in the embedded Apertis device. + + - External – data stored on a removable device. External devices can + be removed and have their data altered elsewhere, so care must be + taken when caching results. + + - Remote – data available from the internet. Availability depends on + whether the Apertis device has network access, user preferences + governing network use, and the status of the remote service. + +### Content Categories + +The results returned from primary sources may be divided into broad +categories: + + - Applications – Installed applications, applications in the currently + running application stack, and perhaps software available from the + application store. + + - Preferences – Application or global UI settings. + + - Documents – Spreadsheets, presentations and word processor files. + Web pages, including the web browser's bookmarks, would also fall in + this category. + + - Media – Photos, videos and music. This could also include radio + stations, both broadcast and internet. + + - Contacts – E-mail, phone and chat contacts. + + - Events – Important dates from a calendar application or social media + sites. + + - Communications - Emails, SMS and conversation logs from chat + services. + + - Definitions – Dictionary entries and Wikipedia articles. + + - Locations – Points of interest from the navigation software and the + current location. + +Applications should provide a list of categories that apply to the +content they handle to allow the search framework to make intelligent +decisions regarding the scope of a search. + +It is likely that some applications will want to extend the available +set of categories by providing new categories in their manifests. +Collabora recommends that developers wanting to add a new category are +required to be approved by the application store. + +Allowing developers to specify their own content categories would reduce +the search front-end's ability to combine and prioritize similar results +if applications chose different category names that mean the same thing. +The application store would be able to approve or deny any request for a +new category, and suggest re-use of an existing category if appropriate. + +Even the list above isn't completely orthogonal – definitions could be a +subset of documents. Special cases like this should only be considered +if it's deemed that a clear benefit arises from the separation. + +### Content Flags + +Search results can contain additional Boolean properties that may apply +to all categories. Collabora recommends a collection of flags to further +qualify search results in order to allow better sorting and +presentation: + + - Favorite – content with this tag has been selected by the user as + high priority – favorite radio stations, contacts, e-mail threads. + + - Online – Activating some search results – such as browser bookmarks + - would require a data connection. + + - Fee – The result leads to a service with a fee for usage. Examples + could include long distance phone calls, or application store + software. + +As with content categories, it may be useful to allow applications to +specify new flags in their manifests. The same concerns apply here as +for categories, and the application store should carefully consider +which new flags are allowed. + +### Auxiliary Information + +In many cases, auxiliary data can be added to the search results either +to provide useful information to the user, or to assist the search +manager in prioritizing results more effectively: + + - Frequency/recency of usage is useful for prioritizing search + results. + + - Presence information can be provided for contacts in search results. + + - Thumbnails can be generated for local media. + + - Weather can be provided for locations (with the current location + either settable as a preference, or taken from a GPS device) + + - Distance from current location can be determined for locations – + linear distance can be determined quickly, but a driving distance + would take significantly longer. + + - More advanced auxiliary information providers could look up movie + ratings and reviews from online services. + +In some cases, such as presence information for contacts, the auxiliary +information is provided by the same library (libfolks) and at the same +time as the primary results. In other cases, the search manager may need +to query auxiliary data sources as an additional step. + +Unlike flags and categories, auxiliary information can't be extended by +application manifests, since it must be fully understood by the search +framework to be displayed or utilized for priority calculations. + +It is possible that a system will have multiple sources for the same +auxiliary information – perhaps a freshly installed system uses +Google for querying weather information. If a user then installs a +third-party weather application, it may be capable of providing more +accurate forecasts. + +> The Google Weather API actually ceased to exist in August of 2012 +> and is mentioned only for illustrative purposes. + +Resolving which provider to use in situations like these may be +difficult. Some possible resolution methods would be: + + - If an application is present on the user's home screen it will be + selected. + + - Most recently installed applications will be selected. + + - The HMI could provide an interface for selecting the preferred + provider. + +While HMI intervention is not a preferred option, it may not always be +possible to infer the user's preference without assistance. + +## Search Priority + +Not all information is of equal importance, and if a search has too +large a number of matches to display, the higher priority matches should +come first. Since there are many primary sources with differing response +times, the results must be prioritized or the fastest responders will +dominate the results list. + +Having a few different priority levels to assign the different +categories to should be sufficient: + + - Top – Contacts and recently or frequently used items of all + categories. + + - High – Media, Documents and nearby locations. + + - Medium – Applications and application settings. + + - Low – E-mails, chat logs and SMS contents. + + - Bottom – Pay-for-use services. + +Within priority levels, information can be sorted with auxiliary +information. For locations, distance from current location could be a +reasonable sort criteria. For applications, the most recently used +applications should likely be higher up the list. + +## Speech Recognition + +Hands free operation is a necessity in an automotive user interface, and +the global search interface needs to be implemented with that as a +primary goal. Entering arbitrary words, and having the search framework +update a list of results while a request is being entered isn't possible +with speech recognition. + +The search framework needs to be designed to be accessed comfortably in +two different input modalities. By providing two search methods – a full +search, and a simplified keyword search, the same powerful search +mechanisms can be accessed easily by either voice or entered text. + +The use of keywords for initiating and filtering searches will simplify +verbal interaction with the system and provide a fast and efficient +interface. Category names could also be recognized, allowing a quick +interface to recently used items. + +Applications should provide a list of keywords in their manifests to +indicate the set of keywords they may return in their search results. +Allowing applications to add new keywords from their manifests is likely +less problematic for the search interface than new categories or flags, +and as such needs little or no application store review. However, +localization of category names and keywords is critical, since Apertis +may be deployed in multiple languages. + +It may be worthwhile to hard code some response logic, such as “weather†+launching the preferred weather application, or having a short phrase +like “switch to \<name of local radio station\>†control the radio. + +It would be simpler to do this than to try to fine tune the search +system's heuristics to cause this to naturally occur, and would prevent +installation of a new application (which might share keywords with +installed applications) from changing expected behavior. + +## Guidelines + +Collabora feels the following features will help create a responsive, +flexible and convenient global search interface. + +### Decentralized Indexing + +Trying to store all these different types of data in a single central +repository for searching presents some difficult problems: + + - If the on-disk format of the search database changes, a lengthy + re-indexing of all searchable content must take place. + + - Remote content has dramatically different requirements than local + content, and may change or disappear. + + - If an application's data is already in a conveniently searchable + form, storing a second copy of it in a database wastes storage + space, cache memory, processing time, and, potentially, decreases + user interface responsiveness. + +Apertis has special considerations as well – the application rollback +system also governs the settings and data associated with an +application. If a rollback is performed, data in a central database +would have to be purged and re-created. + +Separating the search front-end from the database and allowing it to +query multiple sources for results will allow the use of many different +available components, allow searching remote content that can't be +indexed, and allow for search back-ends with different search strategies +and response times to be compiled into a single result list. + +### Extendable Via Plug-ins + +Many desktop search applications aggregate data from several +back-ends to produce their search results. Each source has a plug-in +specifically written to process a certain kind of data and return +standardized search results. + +> [][Using existing global search software] +> provides details on some existing global search solutions. + +Allowing applications to be responsible for providing search results on +their own data enables them to provide more appropriate results than if +a general purpose service naively indexed everything on the system. + +Applications would be able to provide their own plug-in, which may +communicate with an application agent, to create a custom search +back end for the application's content. + +> Agents are described in the `Software agents in Apertis` document + +Further, application search databases can be stored with the rest of the +application data in a way that allows application rollback to govern +them as well, so in the event of an application rollback search results +will still be consistent with the data and no lengthy re-indexing +process will be required. + +Some back-end plug-ins may be capable of prioritizing their results. +These priorities should be normalized for fair comparison across +plug-ins, and then used by the front end to sort results within priority +levels. + +### Easy for Application Developers + +Many applications will work with data that should be exposed via the +search interface, but if integrating an application with global search +is difficult then developers may do it poorly or not do it at all. + +For applications using the Apertis persistence framework to store data, +it may be possible to have a single search plug-in that can mine the +persistence framework to produce results for multiple applications. + +Since the applications are responsible for the structure of their data +in the persistence framework, it's difficult for a generic plug-in to +guess what data should be searchable. Applications may store sensitive +information, such as passwords, in the framework as well. + +Another difficult problem is that the plug-in should be able to track +which results were selected in order to increase their priority in +future searches, but this is difficult to maintain separately from the +searchable data. + +The following criteria simplify the implementation of a generic plug-in +for mining the persistence framework: + + - The persistence framework allows applications to create special + tables for searchable data. + + - Only the contents of these tables are searchable. + + - The format for searchable data is dictated by the persistence + framework and contains extra fields for use by the plug-in for + gathering statistics. + + - The application manifest indicates whether the plug-in can search an + application's data – even if the application uses the searchable + data format, it may still provide its own search plug-in, and not + wish to have its results duplicated by the generic plug-in. + +In addition to allowing applications to intentionally expose data to the +search framework, if the SDK provides functionality for an application +to maintain a list of recently used items in a standard way, a generic +plug-in could use that information to provide search results. + +Activation of these search results must invoke the application in a way +that the appropriate data is immediately displayed. The application +manager and the application will have to negotiate this launch. + +Searching the persistence API's storage is covered further in +[][The SDK persistence API]. + +### Highly Responsive + +Users will expect new search results to be presented as they type, with +the result list becoming more refined the more text they enter. It is +important that the text entry always feel responsive, even if the +results are slightly delayed. + +Search results may take a noticeable amount of time to accumulate. Local +results should arrive quickly, but remote results could take seconds. +Waiting for all results to be available before presenting any to the +user would result in a disappointing experience. + +In order to avoid penalizing fast responders to wait for the slowest +plug-ins to finish their queries, search results should be displayed to +the user promptly as they become available. + +Asynchronous coupling between primary and auxiliary is also important. +If a search returns a contact, the user may intend to send an email or +place a call to that contact immediately – waiting for online status +before showing that search result at all might give the impression that +the search system is slow. + +An indication that search results are still being accumulated should be +presented to the user, as slow responding back-ends may take a +significant amount of time to finish, and a user may choose to wait for +more search results if they know more may become available. + +It may be preferable to delay querying slow, online, or resource heavy +search result providers until the user signifies the end of text +interaction in some way. A quickly accumulated subset of potential +search results could be displayed during text entry with a full search +only conducted if they hit “enter†instead of selecting a result. + +This would prevent sending off a large number of resource intensive +requests for every entered character during the time when they're likely +to be immediately invalidated by more input. + +### Limited System Impact + +If the search framework immediately responded to a search request by +sending requests to all available plug-ins concurrently, the resulting +spike in I/O and memory consumption would likely have detrimental +effects on system interactivity. If the search results in significant +storage device access, useful data will be pushed from system caches +resulting in a generally sluggish system for a while after a search +takes place. + +Efforts should be made to do the minimal amount of searching possible to +satisfy the user's request. Since applications are required to specify +in their manifests what categories and keywords apply to their data, a +keyword based search only needs to access a subset of search plug-ins. + +Starting with a “shallow†search and allowing a progressively deeper +search (perhaps by touching a “more results†button, or by speaking the +word “moreâ€) will allow the search manager to query high priority +plug-ins first, and only query lower priority plug-ins if the user is +dissatisfied with the search results. + +The initial search will prefer plug-ins for applications on the home +screen and applications that are already running, as well as higher +priority search content, with subsequently “deeper†searches progressing +to lower priority levels. + +As the user performs searches and the system accumulates more +information on what plug-ins are most likely to provide the results they +choose, the “more results†function will be used less and less +frequently. + +### Predictable Interaction + +Rapid changes in already visible search results could result in the user +selecting an unintended item. Care should be taken to minimize movement +of search results after display. + +Results should be displayed in sorted order, not displayed and then +sorted. As new items are added they may change the position of existing +items – new high priority results will push lower priority results down +the list. + +Aggressive timeouts may need to be set for online sources to help +mitigate this. Search results from online sources could be given a +shared timeout, at which point the results will be ordered and injected +into the displayed list all at once. + +If the result list can be navigated with up/down buttons or a similar +physical interface then the selection should stay with the currently +selected item if new results appear. If the selection stays with the +ordinal position in the list, then an unintentional activation is much +more likely to occur. + +### Balance of Configuration and Heuristics + +Exposing preferences to control all aspects of the search process will +almost certainly confuse more users than it will help. Trying to +represent all the possible combinations of flags to the user in a +sensible way will likely not be possible. The ability to turn individual +search sources on and off is probably useful, and this is the way search +configuration is presented on some operating systems (OSX, Android). + +If the interface is too configurable it makes testing new search +heuristics more difficult, as they need to be tested for interactions +with all possible combinations of the available settings. Giving the +user control over what is searched, but not how it's presented, should +allow some user customization while maintaining consistency for +developers. + +The system should track a user's search history and use that information +to change the priority levels of content categories, and the effect of +content flags. This will allow the system to adapt to a user's +preferences over time. Since applications can add new content +categories, flags, and keywords this will also allow these new types to +eventually find the priority level that matches the users interest In +them. + +Some system settings should affect the search system. If Apertis is +equipped with a wireless modem, the search system should obey the system +settings for wireless data usage. It might be useful to allow finer +grained control over remote searching. Back-ends that require network +traffic to perform a search could be presented as a single result (like: +“Search Wikipedia for: ...â€). Activating that result would perform the +remote search and replace the single line with the new results as they +become available. + +## Potential Search Back-ends + +A significant body of search software already exists and would be +appropriate to integrate into a global search framework; some convenient +libraries and protocols exist for quickly creating new search back-ends. + +The following sections provide an overview of some potential primary and +auxiliary sources. For some of them indexing services are already +available, others don't yet have a free implementation or are Apertis +specific. + + +> [][New software development] later in this document is +> intended to give an overview of what suggested components would +> require new software development. + +### Primary Sources + +The following software solutions bear strong consideration for inclusion +as primary search backends: + + - [Zeitgeist] - An activity logger that tracks frequently used + content as well as chat logs. While it's possible for individual + apps to track recently used data, Zeitgeist can track this data on a + whole system level. + + - [Evolution-data-server] - A component allowing access to + calendar, tasks, and address book information. + + - [Folks] - A “meta-contact†aggregator that can return information + for contacts across a wide array of services (including + Evolution-data-server's contact information). + + - [Grilo] - A framework for browsing remote media. + +New search backends could readily be built from: + + - [OpenSearch] - A standard for internet based searching + implemented by many existing searchable pages – Wikipedia, Google, + Bing, and IMDb to name a few. + + - [Lucene++] - A generic text search engine that can be used in + applications that want to implement their own search back-ends. + +Some Apertis specific systems are good candidates for delivering search +results: + + - Application Manager – The application manager could provide search + for installed applications, and perhaps even allow searching running + applications to allow a quick jump to recently used applications on + the application stack. + + - Preference Manager – The preference manager has access to all + application and global UI settings, and could provide these settings + to the search framework. + + - Browser – The browser application's bookmark list should exposed by + the search infrastructure. + +There may be times when more than one primary search source returns the +same result - the Zeitgeist activity logger, for instance, tracks +recently used content. Recently played media may be returned as a search +result from both Zeitgeist and a media indexing service. When such a +collision occurs, the two results should be combined (before consulting +auxiliary sources) and displayed as a single search result. + +Some care will need to be taken in selecting how the plug-ins query +results. For example, the application and preference managers could be +queried over D-bus since they're likely to be long running services. The +search plug-in for browser bookmarks should directly query the bookmark +database, as it would be undesirable to launch an instance of the +browser to service a search request. + +### Auxiliary Sources + +Once a result is provided, useful additional information can be added by +auxiliary sources: + + - Tumbler can provide thumbnails for documents and media + + - Plugins can offer related searches, eg. songs by the same artist or + in the same album, similar songs, places near a location, + + - On-line services could be used to retrieve album art, lyrics, or + movie plot synopses. + +### The SDK Persistence API + +The SDK will provide a persistence API to applications – as this API can +be used to store recently used or favorite items. The SDK Persistence +will also provide a plugin for the global search infrastructure, to +provide useful information as both a primary and an auxiliary source. + +Several types of data could potentially be managed by the SDK +persistence API: + + - Favorite lists - items the user has declared to be important. + + - Recently used lists – items the user has interacted with recently. + This is a convenience API to information stored in Zeitgeist + + - Application-specific data – anything an application wants exposed to + the search framework. + +Data should be stored in such a way that the search result can be easily +passed to the appropriate application for launching. One possible set of +data for an item stored by the persistence API would be: + + - The information classification (as in [][Information classification]) + for the stored item. + + - The name of the item – the name of the web page a bookmark refers + to, name of a radio station, etc. This is what will be shown as the + search result. + + - A reference to the activatable item - a local file name, a URL, or + other relevant data that would be passed to the application to + activate it. + + - The time of the last usage of this piece of data (see following + comments). + + - Potentially some simple keywords so proprietary data can be better + integrated with search. + + - Any additional information the application wishes to attach to this + item - unused by the search system. + + - Any additional information used by the search subsystem, not + modifiable by the application itself. For example, the original + plugin that provided the item. + +In practice there are several ways to decide if an item is recently +used. An application could track the last 5 documents it has been +required to open, or a web browser could track all sites it has visited +in the last 2 weeks. + +Examples of favorites and recent used items for common applications + +| Application | Favorite | Recent used items | +| ------------ | -------------------- | ----------------- | +| Web browser | Bookmarks | History | +| Navigation | Favorite places | Last destinations | +| Radio | Station list | Last station | +| Weather | Favorite locations | Last location | +| Contacts | Favorite contacts | Last contacts called or messaged (sent or received) | +| Documents | Files in ~/Documents | Last opened documents | +| Media player | Playlists | Last played | +| Calendar | Next events | Last opened event | + +It is recommended that regardless of the methods of determining recency, +a date of last usage is stored in the persistence framework for +searchable items. This will allow the search system to fairly prioritize +results from different applications. + +Application-specific data presents a rather big challenge to the search +framework, both in terms of implementation and UI design. While some +application concepts can be represented in intuitive ways by a generic +search interface, that will be the exception rather than the rule. +Therefore, Collabora recommends that search be limited to item names and +keywords that the application may associate with the name. More complex +searches, such as searching for music that is above a given media +rating, should be available in the application itself, otherwise the +general search will be too complex to use and implement. + +## Example Search Flow + +A search begins in the HMI, either by voice recognition or by +interaction with the touch screen. Before any lengthy search is +performed, hard coded response logic is checked for simple responses +such as changing the radio station or checking the weather at the +current location. + +If this logic completes the search, the appropriate action is taken and +the interface is dismissed. If there is no user selection within a +configurable time, the search engine begins performing queries of the +back end plug-ins, starting with the highest priority information +categories (static data content), including auxiliary information. + +As search results are accumulated and displayed, the user is able to +either select from the presented results, or request that the search +engine try lower priority (and potentially slower) content types to +satisfy the request (dynamic data). + +It's not unlikely that a single plug-in can return results of different +content types – the application manager's plug-in, for example, may +return applications as well as application preferences. The search +system must be able to tell the plug-ins that it is currently only +interested in a subset of available content types to control the +returned results. + +The plugins may also suggest related search items, eg. Similar songs, +songs by the same artist, places near a location. The UI will display +these related items as a subitem. If selected, the search engine will +initiate a new search with the selected condition, and the search will +start over. + +Once the user selects an appropriate result, the appropriate action +should be taken (Some examples are: launching an application or changing +the radio station). The search framework should use the finally selected +search result to assist in re-prioritizing plug-ins and categories for +future searches. + +## Implementation Examples + +It is not the intent of this document to dictate application design +decisions, such as file formats or storage methods for application data +(like bookmarks, calendar entries, and contact information.) + +However, this section provides some potential ways to provide search +results for each of the content types from [][Content categories] +and some recommendations that may make developing the search +system easier. + +Collabora recommends against trying to use Tracker as an indexer for any +proprietary data formats, instead preferring a plug-in for the search +framework instead. + +If an application changes the format of the data it wants to store, the +Tracker database would need to be updated for application management +operations. Tracker's database is not governed by the application +rollback system, so these updates would not be reversible. + +Similarly, it would be preferable to avoid using Tracker to mine any new +file types, or have it index application storage areas other than the +general storage area. Proprietary file types can instead be handled by +agents or plug-ins provided by the applications that operate on them. + +Since Apertis will not have a file browser, some standard file types +(vcards, icalendar, GPX) should likely not be stored at all, and instead +be consumed and deleted by the appropriate application when presented to +the device. + +Allowing these formats to be stored, indexed and displayed as search +results would create confusion when the application responsible for that +data type also returned a similar search result. This problem is +explained further in the following sections. + +### Applications + +For the purposes of global search, applications can very broadly be +separated into two groups: + + - Installed – results can be returned by a plug-in that uses the + application launcher's database of application manifest to return + pertinent results. + + - Available from the store – a plug-in that connects to the + application store could locate installable applications that match + the user's search. + +### Preferences + +The Apertis Application Development document defines a system in which +application settings for all applications are managed by a single +app-settings application. + +Under such a system, a single plug-in could be written to provide any +settings managed by the preference manager as search results to the +global search front end. + +### Documents + +Document search results can be provided by several sources: + + - Local documents in standard formats will be returned by the system + indexer. + + - Favorite and recently used files and web pages can be returned by + the SDK persistence API search plug-in. + + - A plug-in could perform a Google search. + + - Data in proprietary formats could be searched by application + specific plug-ins. + +### Media + +The Media Management Design deals specifically with the handling of +media content via a combination of Tracker, Tumbler and Grilo. + +Radio station results could be provided by the SDK persistence API. +Tracker also has an ontology for radio stations, so storing station data +there is an option. + +### Contacts + +The Contacts design defines an approach to contact management based on a +libfolks front end. A plug-in using libfolks could be created for the +global search system to provide contacts as search results. + +A file format – vcard (.vcf, .vcard) exists for the exchange of contact +information. If it's deemed necessary to index these for some reason, it +should be noted that: + + - “Activating†a vcard file generally results in adding a contact to a + contact database – which is quite likely not what the user is trying + to do via the search interface. + + - A vcard file may contain a subset of the information available to + libfolks, and will not remain in sync with it if contact information + is updated. + + - Activating the vcard may in fact replace more recently updated + information in the contact system with older data. + +As such, a vcard file search result may be hard to distinguish from a +contact search result, and vcard files should probably not be returned +as results at all. + +### Events + +Like contacts, calendar events have a standardized file format for +passing along event data – iCalendar (.icf). Also like contacts, this +format is probably only used for synchronizing events between devices +and is probably not the calendar application's native storage format. + +Like .vcf files, .icf files should probably not be part of the returned +search results to avoid confusing behavior. Instead, a plug-in that uses +the calendar applications native storage format could provide these +results. + +Depending on application design decisions, a single calendar application +might not be the only source of searchable “events†- a social media +application might also provide search results. + +### Communications + +The applications responsible for handling phone, SMS, e-mail and instant +messaging data can all be responsible for searching their own logs for +providing search results. + +A plug-in based on libfolks could provide auxiliary information about +the contacts involved in the communications returned by the primary +results providers. + +### Definitions + +A Plug-in could search Wiktionary via the OpenSearch API, or a +standalone dictionary application could provide a plug-in to provide +results from its local database. + +### Locations + +Navigation and weather software can provide favorite or recent locations +via the persistence API's plug-in. + +A plug-in for the navigation software could allow searching the map data +to return possible destinations, and a weather plug-in could be queried +for current conditions at those locations. + +A weather plug-in should probably employ efficient caching, since +searching for nearby points of interest will almost always return a +large number of locations in the same weather reporting domain. + +## Using Existing Global Search Software + +Many search frameworks already exist, and it may be possible to re-use +some of their code. [Unity lenses] have been singled out as a +particularly interesting search architecture. + +The Unity search system consists of 3 pieces: + + - The Dash – the user interface components. These are an integral part + of the Unity UI, which itself is a plug-in for the compiz window + manager. + + - A collection of Lenses – search front ends which pass up result + lists to the user interface components. Each data type is intended + to have its own lens. + + - A collection of Scopes – back end plug-ins that return results to + front end lenses. A lens can pull data from any number of scopes. + +Lenses and Scopes are processes launched via D-bus to service search +requests – though a lens may have a “local scope†built into it and not +require any additional scopes. Both components are written in the Vala +programming language using libunity, and must have D-bus .service files +so they can be demand launched by D-bus activation. + +In order to leverage the Unity Lens search infrastructure in Apertis, +the front end components would have to be re-implemented – or the code +from the Unity compiz plug-in could be extracted and heavily re-factored +to fit within the Apertis UI. + +The existing code is heavily integrated with Unity, and may be very +difficult to extract without having to also duplicate a lot of other +Unity functionality. It may be easier to mimic the dash's D-bus +interfaces instead of trying to fit its code into Apertis. + +Since the lens architecture requires the user to select what kind of +data they're searching for, in addition to UI for displaying search +results, a method of selecting which lens to search with would also be +required. In the Unity Dash this is known as the “lens barâ€. + +A set of Lenses are required, one for each type of searchable data – the +list of content categories from [][Content categories] would provide a +good selection of lenses. Some of the lenses already available for Unity +might fill these roles. + +Scopes would need to be created for the different data sources – such as +a generic plug-in for mining the persistence framework. Since the +persistence framework might contain data that fits different categories, +multiple scopes may need to be written for it, each presenting only one +category of information. + +Multiple scopes can provide results to a single lens, so, for example, a +“communications†lens could have a back-end scope for e-mail, and +another for SMS messages. + +The lens concept differs slightly from the search paradigm presented +earlier in this design. Using lenses, the user would have to pick what +type of data they were searching for by selecting a lens, as opposed to +all types of data being prioritized and combined in a single list. + +## New Software Development + +To implement a global search interface like the one described in this +document, new software components will need to be created: + + - A plug-in framework for integrating search back-ends, perhaps built + on or with code re-used from software from [][Using existing global search software] + A similar plugin framework is also offered + by Grilo. Although Grilo is focused on multimedia content, the + plugin framework could be reused and adapted to serve general + content, as needed by the SDK Persistence API. Also, Grilo is + already used within Apertis, avoiding new dependencies. + + - Plug-ins for the framework – many of these will be thin wrappers + around existing search functionality such as that listed in [][Potential search back-ends], + some will be Apertis specific and require more development. + + - A UI for presenting and interacting with search results. + + - Preference management for the search system. + +[Zeitgeist]: http://zeitgeist-project.com/ + +[Evolution-data-server]: http://www.go-evolution.org/EDS_Architecture + +[Folks]: https://live.gnome.org/Folks + +[Grilo]: https://live.gnome.org/Grilo + +[OpenSearch]: http://www.opensearch.org/ + +[Lucene++]: https://github.com/luceneplusplus/LucenePlusPlus + +[Unity lenses]: http://developer.ubuntu.com/resources/technologies/lenses-and-scopes/ diff --git a/content/designs/hardkeys.md b/content/designs/hardkeys.md new file mode 100644 index 0000000000000000000000000000000000000000..cedf889af3c2b8cad3618b75abb05f16464040e1 --- /dev/null +++ b/content/designs/hardkeys.md @@ -0,0 +1,222 @@ +--- +title: Hard keys +short-description: Hardware keys (volume controls, home screen button ...) + (implemented) +authors: + - name: Sjoerd Simons +--- + +# Definition + +Hardkeys: Also known as Hardware keys; Any controls in the car system +connected to the Head unit. Examples of hardkeys are volume controls, +fixed function buttons (back, forward, home screen, play pause), rotary +controls etc. + +Traditionally hardkeys referred only to physical controls (e.g. +switches, rotary dails etc), hence hardware keys. But current systems +are far more flexible and due to that this design can also refer to +software buttons rendered by the system UI outside of the applications +control or touch sensitive areas with software controlled functionality. + +As a simple guideline to determine if a control falls under this design +concept is to sample ask the question: "Could this functionality have +provided by a physical key/knob". If the answer is yes, it fits into +this design otherwise it doesn't. + +## Out of scope + +For the current design the following aspect our defined as out of scope. +They are either addressed belong to other designs or could be addressed +in future iterations. + + - Haptic feedback + - Application controlled buttons (e.g. application configurable icons) + - short vs. long key presses handling (part of the UI + framework/toolkit) + - Display systems other then wayland + - Any requirements about the type of controls that need to be + available. + - Implementation design; This document explains the high-level + concepts not the implementation + +# Types of usage & example use-cases + +We recognize three different types of controls and the effect that they +have on the system: + + - System controls: Effect the system as a whole (volume controls, home + key). + - Role controls: Effect the current system context (pause/play) + - Application controls: Effect the current foreground application + +The following sections provide more detail for these various types of +controls. + +## System keys + +Buttons handled by "the system". Regardless of application state. +Example volume controls (always adjust the system volume), home screen +buttons (always navigate back to the home screen), application shortcut +buttons e.g. navigation (always open the specific application). + +### example use-cases + + - Bob is using a webbrowser, presses the "home button". The system + goes back to the homescreen. + - Alice presses the "navigation" button. The system opens the + navigation application + +## Role specific keys + +Buttons that should be handled by an agent or application furfilling a +specific role. An example here are play/pause buttons which should get +handled by the program currently playing audio (e.g. internet radio +application, local media player etc). + +### Example use-case + + - Simon starts playing music in the spotify application, switches to + the webbrowser application while the music streams keeps playing in + the background. Simon doesn't like the current song and presses the + "next song" button, the spotify agent running in the background + switches to the next song. + +## Application keys + +Buttons that should always be handled by the currently active +application. For example up/down/left/right select buttons. + +### Example use-case + + - Harry is browsing through a list of potential destinations in the + navigation application. He turns the rotary dail on the center + console, the focus moves to the next potential destination on the + list. + - Barry looks up a new radio station in the radio application. After + listening a while he decides he likes the current station. Barry + hold the "favourites" button for a while (long press), the radio + application bookmarks the current station. + +# Non-functional requirements + +Apart from the use-cases mentioned above, there are several requirements +on the design that don't directly impact the functionality but are +important from e.g. a security point of view. + + - Applications should not be apple to eavesdrop on the input send to + other processes + - Applications should not be able to injects inputs into other + processes ([Synthesized input]) + +A more complete overview of issues surrounding input security +(integrity, confidentiality) be found on the +[Compositor security] page. + +# Design + +On wayland systems the design of inputs is relatively straightforward, +the compositor is responsible for gathering all the inputs from the +various sources and chooses how to handle them. The diagram below has a +high-level overview of some example input sources and examples of the +various types of controls. + + + +Thanks to the Wayland input flow only having two actors (the compositor +and the receiver) (as opposed to that of X11) that there is no way for +applications to either spy on the inputs of other applications or to +inject inputs into other applications. + +## Compositor Input sources + +Various example inputs are shown in the diagram (though others could be +defined as well): + + - Local inputs: evdev is the kernel input subsystem, directly attached + controls will emit key events using that subsystem (e.g. i2c + attached buttons) + - External inputs: Any input sources that aren't directly attached, + e.g. inputs via the CAN network or even an IP network + - Software inputs: Software defined input sources, e.g. onscreen + buttons drawn by the compositor or pre-defined touchscreen areas + +Note that the exact implementation of gathering input is left up to the +implementer. E.g. for CAN inputs it's not recommended for the compositor +to have direct access to the CAN bus, however that design is part the +implementation of the generic CAN handling. + +Each of these events input will feed into the input processing +internally into the compositor. + +## Compositor Input processing + +All input events are gathered into one consistent way of input +processing in the compositor. From both the compositor internals and the +applications/agents point of view the exact technology to gather the +inputs is not relevant (Note that the device could be, e.g. steering +wheel controls vs center console controls). + +The task of the input processing into the compositor is to determine +where the key event should be delivered and via which method. Following +the classification outlined earlier. + +### System keys + +Keys meant for the system are processed directly by the compositor. +Resulting in the compositor taking an action either purely by itself +(e.g. switching to an overview screen) or by calling methods on external +services. In the example given the diagram, the compositor processes the +"volume up" key and as a result of that uses the libpulse API to ask +pulseaudio to increase the system volume. + +### Application keys + +Keys meant for the current foreground application are simply forward to +the application using the wayland protocol. All further handling is up +to the applications and its toolkit, this includes but is not limited to +the recognition of short-press vs. long-press, application specific +interpretation of keys (e.g. opening the bookmarks if the favourites key +is pressed) etc. + +Note that the current foreground application might well be the +compositor itself. For example if the compositor is responsible of +rendering the status bar, it could be possible to use key navigation to +navigate the statusbar. + +### Role keys + +Keys for a specific role are forwarded to the application or agent +current furfilling that role. It is expected that each role implemements +a role-specific D-Bus interface either for direct key handling or +commands. + +For example for music players, assuming the MPRIS2 is mandated for media +agents (including Radio), the compositor would determine which agent or +application currently has the Media role and call the MPRIS2 method +corresponding to the key on it (e.g. Next). + +The reason for requiring role-specific D-Bus interfaces rather than +simply forwarding keys via e.g. the wayland protocol is that an +agent doesn't have to connect to the display server, only to the D-Bus +session bus. It also means there is a clean seperation between the +actual key handling and the action taken. For example an OEM may define +a short press on a "forward" key meaning a seek, while a long-press +means "next track", in this design that can purely be policy in the +compositor. + +# Key handling recommendations + +Even though it is out of scope for this design specifically. Some +general recommendations about key handling: + + - Keys should be specific to one category even when used in long vs. + short press or in combinations. As undoing key events is impossible. + - Any combination keys or keys with long-press abilities should only + be handled on key release (key down ignored), for the same + reasoning. Cancelling an in-progress action is either confusing to + the user or not possible. + +[Compositor security]: https://wiki.apertis.org/Compositor_security + +[Synthesized input]: https://wiki.apertis.org/Compositor_security#Synthesize_input diff --git a/content/designs/hwpack-requirements.md b/content/designs/hwpack-requirements.md new file mode 100644 index 0000000000000000000000000000000000000000..846558239c8fd0886d71797c8a45ae1ce1853609 --- /dev/null +++ b/content/designs/hwpack-requirements.md @@ -0,0 +1,128 @@ +--- +title: HWpack Requirements +short-description: HWpack implementation requirements and considerations. +authors: + - name: Martyn Welch +--- + +# HWpack Requirements + +## Introduction + +This documentation covers the requirements and considerations that should be taken into account when implementing "hardware packs" for the Apertis project. + +## Concepts + +This section briefly covers the concepts and expectations of the Apertis platform. + +### Image Building Stages + +Apertis images are built from binary packages, packaged in the `.deb` format. Building these packages is expected to be carried out from source by the Apertis infrastructure, ensuring all packages dependencies are properly described and reducing the risk of unexpected dependencies. + +The selection and packaging of these packages are predominantly driven by the needs of the two main process steps required to create images, known as the `OSpack` and `HWpack`. + +#### OSpack + +The OSpack stage generates one or more generic (architecture specific but largely hardware independent) archived rootfs built from Apertis packages. These rootfs archives are known as OSpacks. The process is managed by a tool called [Debos](https://github.com/go-debos/debos), which uses yaml configuration files to guide what steps it takes. Apertis provides yaml files to assemble a number of differently targeted OSpacks, ranging from a minimal GUI-less OSpack, a target focused GUI OSpack and a development environment with a desktop style GUI and has pre-packaged the components required to generate these OSpacks. + + + +#### HWpack + +Unlike the OSpack step, the hardware package (HWpack) step does not result in an item known as a HWpack. The HWpack is comprised of a Debos script which controls the processing of a run time determined OSpack to convert it from a hardware independent OSpack into an image which can be successfully booted on a specific hardware platform. In addition to developing the HWpack script, the HWpack step requires the modification and packaging of the required components to perform this transformation. + + + +### Apertis packages + +Apertis standardizes on a specific set of components (with specific versions) and technologies to fulfill the needs of the target platforms. This maximizes sharing and reuse, thus minimizing effort and cost of maintaining common components across products. Deviations from the standard selection may be needed to accommodate product-specific needs but such deviations tend to reduce reuse and thus increase the long-term maintenance efforts. It will likely fall on the product team in question to carry this added effort. As a result, it is strongly preferred for deviations to be minimized with generic improvements and additions made to the standard components to add the required functionality where possible. + +The components selected as part of the base Apertis system need to meet a number of project criteria. + +Licensing requirements +: Components need to be licensed in such a way that they are acceptable for distribution in target devices. For example, GPL-3 is problematic and thus avoided. + +Software revisions +: The specific revisions of the packages are picked to balance the competing customer needs of having up-to-date versions (and thus features), stability and the need for a strong security road map. + +Close to upstream +: Apertis aims to remain relatively close to its upstreams (where the majority of packages are based on Debian stable, the kernel on the latest LTS release). This minimizes the effort required to migrate to newer versions as it means there are minimal patches to port. A large deviation from upstream also decreases effectiveness of testing and the validity of review performed on upstream versions. + +These are some of the key packages from which the Apertis system is built: + - U-Boot/systemd-boot + - Linux kernel + - systemd + - Apparmor + - Wayland + - Mesa + - PulseAudio + +All Apertis packages are packaged using standard Debian packaging, with source code and package configuration stored in the Apertis GitLab enabling automation of the package build process. + +### Typical Apertis OS layout + +The reference Apertis images share a common layout per architecture, enabling images to be shared across the various supported platforms of each architecture: + - Bootloader typically stored in flash + - Kernel and other boot components and configuration stored on rootfs (enabling current update mechanism) + - OSTree used as part of update strategy and rollback (non-OSTree options available for development) + +It is expected that the requirements and practicalities of products based on Apertis will require deviations to be made from this layout. Such deviations however should be carefully considered. Some, such as storing the bootloader at the beginning of the same medium as the rootfs, carry very little impact as far as the functionality of Apertis is concerned. Others such as using a different bootloader, storing the kernel outside of the rootfs or using a different update strategy (such as A/B partitioning) may pose in a non-trivial effort for integration, loss of some Apertis functionality and/or extra on-going maintenance effort. + +The OSpack is expected to contain common functionality to enable use of supported hardware, for example the OSpacks which are intended to be used with an operational graphical environment include Wayland, though the hardware specific drivers are in the HWpacks. When enabling new types of functionality, it is expected that generic support would be added to the OSpacks where applicable. If such functionality is widely used, this should be integrated into the Apertis OSpacks. Support for niche functionality, or functionality not of general interest to Apertis, will need to be added to a product specific OSpack. + +# HWpack Components + +As with the OSpack (and unless specific exceptions provided) all components should be properly packaged and provided with source to enable debugging, extension and further optimization. It is expected that some changes may be viable to be included in the main Apertis packages, some packages may be added to the main Apertis package repositories and others will need bespoke packages which would typically be stored in a dedicated project area, as described in the [contribution process](https://designs.apertis.org/latest/contribution-process.html) document. It is typical for the following areas to need modifications or to be provided, though other modifications may also be required. + +## Bootloader + +Apertis standardizes on the U-Boot as the bootloader on all non-x86 platforms. In order to support the standard Apertis boot, update and rollback functionality it is necessary for the configuration to include the "[Generic Distro Configuration Concept](https://gitlab.apertis.org/pkg/target/u-boot/blob/apertis/v2020pre/doc/README.distro)" (often referred to as "Distro Boot"). The configuration used by this mechanism has been tweaked to work with Apertis rollback mechanisms. + +In order to enable efficient development, it would be advisable to ensure that access to the boot prompt is enabled along with networking support and the PXE and DHCP boot options where applicable. (Note: U-Boot can supports networking via a USB RNDIS gadget should a USB On-The-Go (USB OTG) port be available.) + +## Linux Kernel + +Apertis expects projects that using it have a need to take product security seriously, as a result known kernel vulnerabilities need to be patched and updates made available. Apertis uses and tracks the latest upstream [longterm](https://www.kernel.org/category/releases.html) stable (LTS) kernel available at time of a release being made. The Apertis project strongly recommends that when products use their own kernel, these are kept as close to the upstream kernel as possible and preferably based on an LTS kernel. + +It is understood that in some circumstances it may be necessary to utilize a heavily modified "vendor kernel". Please note that these kernels are typically not provided with any form of long-term support and thus may quickly lack important security and stability fixes. Unless otherwise agreed, the burden of supporting such kernels will remain with the product team. Likewise, in addition to lacking a source of security fixes, many older kernels are known to have serious vulnerabilities that can only be fully resolved/mitigated by updating to a newer kernel. Apertis strongly discourages the use of such kernels. + +The Apertis kernel contains a number of modifications primarily to enhance the Apparmor support provided by the upstream kernel. The patches used by the stock Apertis kernel can be found in the Apertis [GitLab](https://gitlab.apertis.org/pkg/target/linux/tree/apertis/v2020pre/debian/patches/apparmor). In order to support Apertis' use of Apparmor, a kernel needs to support the following Apparmor mediations: + + - file + - ptrace + - signal + - dbus + - network + - capability + - mount + - umount + - namespaces + +Additionally, the kernel should be configured to support the [functionality required by systemd](https://gitlab.apertis.org/pkg/target/systemd/blob/apertis/v2020pre/README). + +For development purposes, the kernel should provide early serial debugging and be capable of booting from an NFS rootfs. + +## Firmware + +It is understood that many hardware platforms may need firmware, provided by the vendor as binaries, to use certain functionality provided by the device. It is still expected that such firmware is packaged as a deb package, though it is understood that source will not be available for such components. The Apertis infrastructure should still be used to build the binary packages. + +## Debos .yaml configuration + +Apertis uses Debos to automate the conversion of binary packages into images suitable for installation on specific targets in several stages. The configuration used for the Apertis reference platforms can be found in [GitLab](https://gitlab.apertis.org/infrastructure/apertis-image-recipes) with their use documented in `README.md`. It is expected that a HWpack provides configuration file(s) that: + - Generate the required image(s) from either a reference or project specific OSpack + - Generate images containing the partitioning expected by the target and project + - Add any extra components needed via the installation of packages + - Are provided with any scripts required to aid in the application of minor changes to tweak the image to required default configuration + - Generate any project specific OSpacks when sufficient support can't be added to the generic OSpack recipes to cover the functionality required by the relevant project. + +### Documentation + +Documentation should be provided with the Debos configuration detailing the use of any configuration files provided and documenting the process to be followed to install the generated images into a new target device to yield a booting system. + +# Testing + +Apertis provides infrastructure to both continuously build and test software on target devices based on [Docker](https://www.docker.com/), [GitLab CI/CD](https://docs.gitlab.com/ee/ci/) and [LAVA](https://lavasoftware.org/). It is expected that the provided source and configuration artifacts (and possibly binary firmware as mentioned above), when integrated into the provided Apertis infrastructure, will be capable of generating images which pass hardware boot testing with no manual steps required. + +# Licensing + +Code, including build scripts, helpers and recipes, developed for Apertis should comply with the [Apertis Licensing](https://designs.apertis.org/latest/license-applying.html). diff --git a/content/designs/image-build-infrastructure.md b/content/designs/image-build-infrastructure.md new file mode 100644 index 0000000000000000000000000000000000000000..93f59fb98bc99fbb4d7153733063b4850d181048 --- /dev/null +++ b/content/designs/image-build-infrastructure.md @@ -0,0 +1,294 @@ +--- +title: Image building infrastructure +short-description: Overview of the image build infrastructure for Apertis +authors: + - name: Sjoerd Simons +--- + +# Apertis image build infrastructure overview + +## Introduction + +The Apertis infrastructure supports continuous building of reference images, +hwpacks and ospacks. This document explains the infrastructure setup, +configuration and concepts. + +## Technology overview + +To build the various packs (hardware, os) as well as images, Apertis uses +[Debos](https://github.com/go-debos/debos), a flexible tool to configure the +build of Debian-based operating systems. Debos uses tools like `debootstrap` +already present in the environment and relies on virtualisation to securely do +privileged operations without requiring root access. + +For orchestrating Apertis uses the well-known [Jenkins](https://jenkins.io) +automation server. Following current best practices the Apertis image build +jobs use Jenkins pipelines (introduced in Jenkins 2.0) to drive the build +process as well as doing the actual build inside +[Docker images](https://jenkins.io/doc/book/pipeline/docker/) to allow for +complete control of the job specific build-environment without relying on +job-specific Jenkins slave configuration. As an extra bonus the Docker images +used by Jenkins can be re-used by developers for local testing in the same +environment. + +For each Apertis release there are two relevant Jenkins jobs to build images; +The first job builds a Docker image which defines the build environment and +uploads the resulting image to the Apertis Docker registry. This is defined in +--- +[apertis-docker-images git repository](https://gitlab.apertis.org/infrastructure/apertis-docker-images). +The second job defines the build steps for the various ospacks, hardware packs and +images which are run in the Docker image build by the previous job; it also +uploads the results to images.apertis.org. + +## Jenkins master setup + +Instructions to install Jenkins can be can be found on the +[Jenkins download page](https://jenkins.io/download/). Using the Long-Term +support version of Jenkins is recommended. For the Apertis infrastructure +Jenkins master is being run on Debian 9.3 (stretch). + +The plugins that are installed on the master can be found in the [plugins +appendix][Appendix: List of plugins installed on the Jenkins master] + +## Jenkins slave setup + +Each Jenkins slave should be installed on a separate machine (or VM) in line +with the Jenkins best practices. As the image build environment is contained in +a Docker image, the Jenkins slave requires only a few tools to be installed. +Apart from running a Jenkins slave itself, the following requirements must be +satisfied on slave machines: +* git client installed on the slave +* Docker installed on the slave and usable by the Jenkins slave user +* /dev/kvm accessible by the Jenkins slave user (for hw acceleration support in + the image builder) + +For the last requirement on Debian systems this can be achieved by dropping a +file called /etc/udev/rules.d/99-kvm-perms.rules in place with the following +content. +--- +SUBSYSTEM=="misc", KERNEL=="kvm", GROUP="kvm", MODE="0666" +--- + +Documentation for installing Docker on Debian can be found as part of the +[Docker documentation](https://docs.docker.com/install/linux/docker-ce/debian/). +To allow Docker to be usable by Jenkins, the Jenkins slave user should be +configured as part of the `docker` group. + +Documentation on how to setup Jenkins slaves can be found as part of the +[Jenkins documentation](https://wiki.jenkins.io/display/JENKINS/Distributed+builds). + +## Docker registry setup + +To avoid building Docker images for every image build round and to make it +easier for Jenkins and developers to share the same Docker environment for +build testing, it is recommended to run a Docker registry. The +[Docker registry documentation](https://docs.docker.com/registry/deploying/) +contains information on how to setup a registry. + +## Docker images for the build environment + +The Docker images defining the environment for building the images can be found +in the +[apertis-docker-images git repository](https://gitlab.apertis.org/infrastructure/apertis-docker-images). + +The toplevel Jenkinsfile is setup to build a Docker image +based on the [Dockerfile](https://docs.docker.com/engine/reference/builder/) +defined in the Apertis-image-builder directory and upload the result to the +public Apertis Docker registry `docker-registry.apertis.org` through the +authenticated upload channel `auth.docker-registry.apertis.org`. + +For Apertis derivatives this file should be adjusted to upload the Docker image +to the Docker registry of the derivative. + +## Image building process + +The image recipes and configuration can be found in the +[apertis-image-recipes git repository](https://gitlab.apertis.org/infrastructure/apertis-image-recipes). +As with the Docker images, the top-level `Jenkinsfile` defines +the Jenkins job. For each image type to be built a parallel job is started +which runs the image-building toolchain in the Docker-defined environment. + +The various recipes provide the configuration for debos, documentation about +the available actions can be found in the +[Debos documentation](https://godoc.org/github.com/go-debos/debos/actions). + +## Jenkins jobs instantiation + +Jenkins needs to be pointed to the repositories hosting the Jenkinsfiles by +creating matching jobs on the master instance. This can be done either manually +from the web UI or using the YAML templates supported by the `jenkins-jobs` +command-line tool from the `jenkins-job-builder` package, version 2.0 or later +for the support of pipeline jobs. + +For that purpose Apertis uses a set of job templates hosted in the +[`apertis-jenkins-jobs`](https://gitlab.apertis.org/infrastructure/apertis-jenkins-jobs) +repository. + +## OSTree support (server side) + +The image build jobs prepare OSTree repository to be installed server side. In +order to properly support OSTree server side, `ostree-push` package must be +installed in the OSTree repository server. + +## Appendix: List of plugins installed on the Jenkins master + +At the time of this writing the following plugins are installed on the Apertis +Jenkins master: + +* ace-editor +* analysis-model-api +* ant +* antisamy-markup-formatter +* apache-httpcomponents-client-4-api +* artifactdeployer +* authentication-tokens +* blueocean +* blueocean-autofavorite +* blueocean-bitbucket-pipeline +* blueocean-commons +* blueocean-config +* blueocean-core-js +* blueocean-dashboard +* blueocean-display-url +* blueocean-events +* blueocean-executor-info +* blueocean-git-pipeline +* blueocean-github-pipeline +* blueocean-i18n +* blueocean-jira +* blueocean-jwt +* blueocean-personalization +* blueocean-pipeline-api-impl +* blueocean-pipeline-editor +* blueocean-pipeline-scm-api +* blueocean-rest +* blueocean-rest-impl +* blueocean-web +* bouncycastle-api +* branch-api +* build-flow-plugin +* build-name-setter +* build-token-root +* buildgraph-view +* cloudbees-bitbucket-branch-source +* cloudbees-folder +* cobertura +* code-coverage-api +* command-launcher +* conditional-buildstep +* copyartifact +* credentials +* credentials-binding +* cvs +* display-url-api +* docker-commons +* docker-custom-build-environment +* docker-workflow +* durable-task +* email-ext +* embeddable-build-status +* envinject +* envinject-api +* external-monitor-job +* favorite +* forensics-api +* git +* git-client +* git-server +* git-tag-message +* github +* github-api +* github-branch-source +* github-organization-folder +* gitlab-plugin +* handlebars +* handy-uri-templates-2-api +* htmlpublisher +* hudson-pview-plugin +* icon-shim +* jackson2-api +* javadoc +* jdk-tool +* jenkins-design-language +* jira +* jquery +* jquery-detached +* jsch +* junit +* ldap +* lockable-resources +* mailer +* mapdb-api +* matrix-auth +* matrix-project +* mattermost +* maven-plugin +* mercurial +* metrics +* modernstatus +* momentjs +* multiple-scms +* pam-auth +* parameterized-trigger +* phabricator-plugin +* pipeline-build-step +* pipeline-github-lib +* pipeline-graph-analysis +* pipeline-input-step +* pipeline-milestone-step +* pipeline-model-api +* pipeline-model-declarative-agent +* pipeline-model-definition +* pipeline-model-extensions +* pipeline-rest-api +* pipeline-stage-step +* pipeline-stage-tags-metadata +* pipeline-stage-view +* plain-credentials +* pollscm +* promoted-builds +* publish-over +* publish-over-ssh +* pubsub-light +* repo +* resource-disposer +* run-condition +* scm-api +* scoring-load-balancer +* script-security +* sse-gateway +* ssh-agent +* ssh-credentials +* ssh-slaves +* structs +* subversion +* timestamper +* token-macro +* translation +* trilead-api +* variant +* versionnumber +* view-job-filters +* warnings-ng +* windows-slaves +* workflow-aggregator +* workflow-api +* workflow-basic-steps +* workflow-cps +* workflow-cps-global-lib +* workflow-durable-task-step +* workflow-job +* workflow-multibranch +* workflow-scm-step +* workflow-step-api +* workflow-support +* ws-cleanup + +To retrieve the list, access the [script console](https://jenkins.apertis.org/script) +and enter the following Groovy script: + +--- +Jenkins.instance.pluginManager.plugins.toList() + .sort{plugin -> plugin.getShortName()} + .each{plugin -> println ("* ${plugin.getShortName()}")} +--- diff --git a/content/designs/infrastructure-maintenance-automation.md b/content/designs/infrastructure-maintenance-automation.md new file mode 100644 index 0000000000000000000000000000000000000000..4791fe2f1ffdfd151035d1a369a276fd8ef1dd21 --- /dev/null +++ b/content/designs/infrastructure-maintenance-automation.md @@ -0,0 +1,252 @@ +--- +title: Infrastructure maintenance automation +short-description: Requirements and plans for automating the Apertis infrastructure maintenance +authors: + - name: Emanuele Aina +--- + +# Infrastructure maintenance automation + +## Introduction + +This document describes the goals and the approaches for automating the +management of the infrastructure used by Apertis. It will focus in particular +on release branching since the new release flow implies that Apertis will need +to go through that process two or three times more than in the past on each +quarter. + +## Goals + +### Data-driven + +Separating the description of the desired infratructure state from the tools to +apply it nicely separates the two concerns: in most cases the tools won't need +to be updated during branching, only the desired infrastructure state changes. + +### Git-controlled + +Basing everything on configuration stored in a Git repository has several advantages: +* all the changes are tracked over time +* the standard Apertis workflows based on GitLab merge requests can be used to review changes +* fine access controls can be configured via GitLab + +### Idempotent + +Every tool should compare the current state with the desired one and not +produce errors when they already match. +Administrator should be able to run the tools at any time, multiple times, +without any ill effect. + +### Scalable + +The Apertis infrastructure is composed of enough services that a centralized +list of things to update when branching is doomed to be outdated every quarter. + +### Single source of truth + +The duplication of the same information between modules should be minimized, +such that updating the single source of truth automatically produces effects +on the depending modules. + +### Reproducible + +Running the tools in a standardized, easily reproducible environment enables +all the administrators to easily deploy changes without any special setup. + +### Explicit + +All the needed information should be explicitly encoded in metadata repository. +The tools using it should strive to not make any assumption on the data and +derive more information out of it. This is another facet of ensuring that the +metadata repository remains the single source of truth. + +## Basic approach + +The basic approach aims at improving the current branching scripts to make them +easier to test by developers, enabling more people to work on them, and to +extend them to fully handle the complete branching process. + +### Add test mode for current branching scripts + +At the moment the quarterly release branching is done through a [set of +scripts](https://gitlab.apertis.org/infrastructure/apertis-infrastructure/tree/apertis/v2020pre/release-scripts) +that get invoked manually by one the Apertis infrastructure team member +from their machine. + +They act directly on the live services using the caller's accounts. + +The first step for improving the branching automation is to add a "dry-run" +mode to all the current release scripts to let developers and admin run them + +### Improve coverage of current branching scripts + +The scripts currently in charge of reducing the manual intervention during the +branching process do not cover all services and repositories which are +part of Apertis. + +Once the "dry-run" mode is in place, new steps need to be added to the branching +scripts to cover the missing services and repositories. + +## Longer term approach + +Larger refactorings are needed to align the current infrastructure to the goals +previously described. + +The following sections describe the steps needed to further improve the +infrastructure maintenance to make it more robust and require less effort +to manage. + +### Centralized metadata + +A new git repository contains the principal metadata about the whole Apertis +infrastructure describing: + +* the currently active release branches + * e.g. `v2020pre`, `v2019`, etc. +* their phase in the release lifecyle + * e.g. `development`, `preview`, `stable` +* their release status + * e.g. `frozen`, `release-candidate`, `released` +* the release from which they get branched from: + * e.g. `2019pre` for both `v2019` and `v2020dev0` +* the matching git branch name + * e.g. `apertis/v2019` +* the APT components they ship + * e.g. `target`, `development`, `sdk`, `hmi` +* etc. + +This provides a git-controlled single source of truth: tools are updated to +fetch the information they need from this repository. + +For instance, the creation of OBS projects should be handled by a tool that: +* fetches the above YAML +* checks the current OBS configuration +* computes the changes needed compared to the desired state, if any +* applies the changes, if any, to align the actual state to the desired state, + providing an idempotent solution +* runs from a GitLab pipeline, providing a reproducible environment that + can be either triggered by changes in the main data repository or manually + +The current infrastructure encodes a lot of information about the releases +in several places: tools should be changed to fetch such information on the fly +from the main data repository or GitLab pipelines should be configured to +monitor the main data repository and automatically apply changes accordingly. + +For instance, the LAVA job templates encode the branch name of the release they +track in multiple places. Either the templates can be enhanced to fetch the +information on the fly from the main data repository, or a pipeline should be +configured in a dedicated branch in the repository to monitor the main data +repository and branch/update the repository accordingly. + +The change compared to the current approach is to minimize the amount of +information that needs to be branched and distribute the branching logic closer +to the entity to be branched. This is meant to avoid the recurring issues where +the current centralized branching scripts failed to branch things properly or +did not include new components to be branched at all. + +### Per-repository branching operations + +For most repositories it is sufficient to add a new git ref when branching for +a new release. In particular, nearly all the the packaging ones do not need any +change to the repository contents and creating a new ref is enough. + +Other repositories need instead some changes to be applied to the contents once +a new release branch is created. A common reason is that the release name is +encoded in some file, which means that the file needs to be updated and the +change needs to be committed and pushed. + +By making branching self-contained in the repositories, moving and renaming them +no longer cause breakage. It also gives full control over the branching of a +repository to the people managing that repository, rather than those who manage +the centralized repository. This can be especially useful for components not +managed by the core Apertis team, owned instead by product-specific teams. + +In general, keeping the branching operation in the same place as the rest of +the contents helps in keeping them coeherent and makes testing easier. + +## Implementation + +### Add test mode for current branching scripts + +Setting the `NOACT=y` environment variable causes the branching scripts to run +in test mode, without actually launching the branching commands. + +### Improve coverage of current branching scripts + +New actions need to be taken when branching a new release. + +This is a non exhaustive list: +* branch LAVA job templates; +* update the configuration on GitLab repositories to create the new release + branch, make it the default, etc.; +* create the relevant `:snapshots` repositories on OBS; +* add support for the `security`, `updates` and `backports` repositories when + branching stable releases. + +### Centralized metadata + +The centralized information can be modeled as YAML, for instance: + +```yaml + +.common_components: &common_components + - target + - development + - sdk + - hmi + +projects: + apertis: + releases: + v2019: + lifecycle: stable + status: released + branched-from: v2019pre + branch-name: apertis/v2019 + upstream: debian/buster + obs-build-suffix: v2019.0 + suites: + v2019: + obs-pattern: '$project:$release:$component' + components: *common_components + v2019-updates: + obs-pattern: '$project:$release:updates:$component' + components: *common_components + v2019-security: + obs-pattern: '$project:$release:security:$component' + components: *common_components + infrastructure-packages: + obs: apertis:infrastructure:v2019 + suite: infrastructure-v2019 + components: + - buster + v2020dev0: + lifecycle: development + status: frozen + branched-from: v2019pre + branch-name: apertis/v2020dev0 + upstream: debian/buster + obs-build-suffix: v2020dev0 + suites: + v2020dev0: + obs-pattern: '$project:$release:$component' + components: *common_components + infrastructure-packages: + obs: apertis:infrastructure:v2019 + suite: infrastructure-v2019 + components: + - buster +--- + +### Per-repository branching operations + +A `release-branching` step should be added to the GitLab CI pipeline YAML in +the repository with the purpose of ensuring that the release-specific contents +match the branch name. + +GitLab does not provide any way to execute an action only when a new ref is +created so the best strategy is to ensure that the `release-branching` script +is idempotent and gets run when changes land to any `apertis/*` refs: if +no changes are detected the step succeeds with no further operations, otherwise +it commit and push the changes automatically, or it submits a MR to be reviewed +before landing the changes. diff --git a/content/designs/infrastructure.md b/content/designs/infrastructure.md new file mode 100644 index 0000000000000000000000000000000000000000..9106e4d0635025d6287956a47dd4185597e84277 --- /dev/null +++ b/content/designs/infrastructure.md @@ -0,0 +1 @@ +# Apertis Infrastructure diff --git a/content/designs/inter-domain-communication.md b/content/designs/inter-domain-communication.md new file mode 100644 index 0000000000000000000000000000000000000000..43a1675f0c698fe82cc89679a61b3a1e6f21bbfa --- /dev/null +++ b/content/designs/inter-domain-communication.md @@ -0,0 +1,3800 @@ +--- +title: Inter-domain communication +short-description: Suggested design for an inter-domain communication system + (proof-of-concept) +authors: + - name: Philip Withnall + - name: Robert Foss + - name: Justin Kim +--- + +# Inter-domain communication + +## Introduction + +This documents a suggested design for an inter-domain communication system, +which exports services between different domains. Some domains can be trusted +such as the automotive domain. Some domains are untrusted such as the +consumer-electronics domain. Those domains can execute on a variety of +possible configurations. + +The major considerations with an inter-domain communication system are: + + - Security. The purpose of having separate domains is for + security, so that untrusted code (application bundles) can be run in + one domain while minimising the attack surface of the + safety-critical systems which drive the car. + + - Flexibility for different hardware configurations. The domains + may be running in one of many configurations: virtualised under a + hypervisor; on separate CPUs on the same board; on separate boards + connected by a private in-vehicle network; as separate boards + connected to a larger in-vehicle network with unrelated peers on it; + in separate containers. + + - Flexibility for services exposed. The services exposed by the automotive + domain are dependent on the vendor which implemented the automotive domain. + The consumer-electronics domain depends on third-parties. Their update and + enhancement cycle and security rules may differ. + + - Asynchronism and race conditions. This is a distributed system, and hence + is subject to all of the [challenges][conc-dis-sys] typical of distributed + systems. + +## Terminology and concepts + +### Automotive domain + +The *automotive domain* (AD) is a security domain which runs automotive +processes, with direct access to hardware such as audio output or the +in-vehicle bus (for example, a CAN bus or similar). + +In some literature this domain is known as the ‘blue world’. This +document will consistently use the term *automotive domain* or *AD*. + +### Consumer-electronics domain + +The *consumer-electronics domain* (CE domain; CE) is a security domain +which runs the user’s infotainment processes, including downloaded +applications and processing of untrusted content such as downloaded +media. Apertis is one implementation of the CE domain. + +In some literature this domain is known as the ‘red world’, +‘infotainment domain’ or ‘IVI domain’. This document will consistently +use the term *consumer-electronics domain* or *CE domain* or *CE*. + +### Connectivity domain + +In some setups the *AD* and *CE* are not directly exposed to external networks +and hardware. In those cases a *connectivity domain* hosts agents which can +directly access the Internet or plug-and-play hardware devices such as USB +keys, SD cards or Bluetooth devices and provide their services to applications +running in the more isolated domains. This domain can be referred to as *CD*. + +### Trusted path + +A [trusted path] is an end-to-end communications channel from the +user to a specific software component, which the user can be confident +has integrity, and is addressing the component they expect. This +encompasses technical security measures, plus unforgeable UI indications +of the trusted path. + +An example of a trusted path is the old Windows login screen, which +required the user to press Ctrl+Alt+Delete to open the login dialogue. +If a malicious application was impersonating the login dialogue, +pressing Ctrl+Alt+Delete would open the task manager instead of the +login dialogue, exposing the subversion. + +In the context of Apertis, an example situation calling for a trusted +path is when the user needs to interact with a UI provided by the AD. +They must be sure that this UI is not being forged by a malicious +application running in the CE. + +### Control stream + +A *control stream* is a network connection which transmits low +bandwidth, latency insensitive messages which typically contain metadata +about data being transferred in a data stream. In networking, it is +sometimes known as the *control plane*. + +A control stream for one protocol may be treated as a data stream if it +is being carried by a higher layer (or wrapper) protocol, as the control +data in the stream is meaningless to the higher layer protocol. + +If a designer is concerned about whether a particular stream’s +performance requirements make it suitable for running as a control +stream, it almost certainly is not a control stream, and should be +treated as a data stream. A new control protocol should be built to +carry more limited metadata about it. + +A control stream can operate without a data stream (for example, if +there is no performance-sensitive data to transmit). + +### Data stream + +A *data stream* is a network connection which transmits the data +referred to by a control stream. This data may be high bandwidth or +latency sensitive, or it may be neither. In networking, it is sometimes +known as the *data plane*. + +A data stream cannot operate without an associated control stream (which +carries its metadata). + +### Traffic control + +Traffic control (or [bandwidth management]) is the term for a variety +of techniques for measuring and controlling the connections on a +network link, to try and meet the quality of service requirements for +each connection, in terms of bandwidth and latency. + +## Use cases + +A variety of use cases which must be satisfied by an inter-domain +communication system are given below. Particularly important discussion +points are highlighted at the bottom of each use case. + +All of these use cases are relevant to an inter-domain communication +system, but some of them (for example, [][Video or audio decoder bugs]) may equally well be +solved by other components in the system. + +### Standalone setup + +An app-centric consumer electronics domain (CE) is running in a virtual +machine on a developer’s laptop, and they are using it to develop an +application for Apertis. There is no automotive domain (AD) for this CE +to run against, but it must provide all the same services via its SDK +APIs as the CE running in a vehicle which has an Apertis device. The CE +must run without an accompanying AD in this configuration. + +### Basic virtualised setup + +An embedded automotive domain (AD) and an app-centric consumer +electronics domain (CE) are running as separate virtualised operating +systems under a hypervisor, in order to save costs on the bill of +materials by only having one board and CPU. The AD has access to the +underlying physical hardware; the CE does not. The two domains have a +high bandwidth connection to each other (for example, Ethernet, USB, PCI +Express or virtio). The two domains need to communicate so that the CE +can access the hardware controlled by the AD. + +### Linux container setup + +Containers are based on Linux kernel containment features, including, but not +limited to, Linux kernel namespaces, control groups, chroots (pivot_root), +capabilities. + +Both AD and CE are dedicated Linux containers on a host directly running on the +hardware or in a virtual machine. AD is allowed to access safety-sensitive devices. +CE is not allowed any access to safety-sensitive devices but may be able to +access external devices like smartphones over Bluetooth, USB mass storage or +security keys. + +Communication is based on the Unix Domain Sockets (UDS) mechanism provided +by the Linux kernel. + +This setup can be used both for production setups on hardware board and +on a developer's system for Apertis application development. +It can be possible to provide a fake AD container for emulation and +testing purposes. + +Isolation between containers is unavoidably limited when compared to the +isolation between virtual machines, just like separate boards provide more +isolation than VMs. This is due to the fact that a single kernel is shared +by all containers. However in this document we assume processes are not able +to escape from the isolated environment or get access to resources on the +host system or other containers for which they haven't been explicitly +granted access. + +[Multiple CE domains][Multiple CE domains setup] are allowed with the +above setup. +In this setup, a [Connectivity Domain][Connectivity Domain] can also coexist +with AD and CE. It is responsible for any interaction with external networks +and provides isolation in the case a network stack is compromised when that +stack is not implemented in the shared kernel. + +### Separate CPUs setup + +The AD is running on one CPU, and the CE is running on another CPU on +the same board. The two CPUs have separate memory hierarchies. They +maybe using separate architectures or endianness. The AD has access to +all of the underlying physical hardware; the CE only has access to a +limited number of devices, such as its own memory and some kind of high +bandwidth connection to the AD (for example, Ethernet, USB, or PCI +Express). The two domains need to communicate so that the CE can access +the hardware controlled by the AD. + +### Separate boards setup + +The AD is running on one mainboard, and the CE is running on another +mainboard, which is physically separate from the first. They may be +using separate architectures or endianness. The two boards are connected +by some kind of vehicle network (for example, Ethernet; but other +technologies could be used). There are no other devices on this network. +The vehicle owner (and any other attacker) might have physical access to +this network. The AD has access to various devices which are connected +to its board and not to the CE’s board. The two domains need to +communicate so that the CE can access the hardware controlled by the AD. + +### Separate boards setup with other devices + +The AD is running on one mainboard, and the CE is running on another +mainboard, which is physically separate from the first. They may be +using separate architectures or endianness. The two boards are connected +by some kind of vehicle network (for example, Ethernet; but other +technologies could be used). There are many other devices on this +network, which are addressable but whose traffic is irrelevant to the +CE–AD connection (for example, a telematics modem, or a high-end +amplifier). The vehicle owner (and any other attacker) might have +physical access to this network. The AD has access to various devices +which are connected to its board and not to the CE’s board. The two +domains need to communicate so that the CE can access the hardware +controlled by the AD. + +*(Note: This is a much lower priority than other setups, but should +still be considered as part of the overall design, even if the code for +it will be implemented as a later phase.)* + +### Multiple CE domains setup + +The AD is running on one mainboard. Multiple CE domains are running, +each on a separate mainboard, each physically separate from each other +and from the AD. The boards are connected by some kind of vehicle +network (for example, Ethernet; but other technologies could be used). +There are many other devices on this network, which are addressable but +whose traffic is irrelevant to the CE–AD connections (for example, a +telematics modem, or a high-end amplifier). The vehicle owner (and any +other attacker) might have physical access to this network. The AD has +access to various devices which are connected to its board and not to +the CEs’ boards. Each CE domain needs to communicate with the AD so that +it can access the hardware controlled by the AD. + +*(Note: This is a much lower priority than other setups, but should +still be considered as part of the overall design, even if the code for +it will be implemented as a later phase.)* + +### Touchscreen events + +The touchscreen hardware is controlled by the AD, but content from the +CE is displayed on it. In order to interact with this, touch events +which are relevant to content from the CE must be forwarded from the AD +to the CE. Users expect a minimal latency for touch screen event +handling. Touchscreen events must continue to be delivered reliably and +on time even if there is a large amount of bandwidth being consumed by +other inter-domain communications between AD and CE. + +### Wi-Fi access + +The Wi-Fi hardware is controlled by the AD or CD. The CE needs to use it +for internet access, including connecting to a network. The Wi-Fi device +can return data at high bandwidth, but also has a separate control +channel. The control channel always needs to be available, even if +traffic is being dropped due to bandwidth limitations in the +inter-domain communication channel. + +As the Wi-Fi is used for general internet access, sensitive information +might be transferred between domains (for example, authentication +credentials for a website the user is logging in to). Attackers who are +snooping the inter-domain connection must not be able to extract such +sensitive data from the inter-domain communications link. + +(*Note that they may still be able to extract sensitive data from +insecure connections over the wireless connection itself, or elsewhere +in transit outside the vehicle; so any solution here is the best +mitigation we can manage for the problem of a website being insecure.*) + +### Bluetooth access + +The Bluetooth hardware might be attached to the AD or CD. The CE +needs to be able to send data bi-directionally to other Bluetooth +devices and also needs to be able to control the Bluetooth device, +controlling pairing and other functions of the Bluetooth hardware. + +To support the A2DP and HSP/HFP audio profiles it may be desirable +to keep the CE in charge of decoding and encoding the audio streams +coming from and directed to the Bluetooth devices. The AD will be +responsible for mixing the output streams directed to the car speakers +and capturing input streams (possibly with noise cancellation) from +the car microphones. + +The following diagrams depict the data and control flow when the Bluetooth +device is attached to the AD. + +Sending audio stream from BT to AD +--- + BT device AD CE + | --- attach ---> | | + | --------- encoded audio ---------> | + | | <--- decoded audio --- | + (mixing decoded audio in AD) +--- + +Sending audio stream from AD to BT +--- + BT device AD CE + | --- attach ---> | | + | | ---- LPCM audio ----> | + | <-------- encoded audio --------- | +--- + +The following diagram depicts the data and control flow when the Bluetooth +device is directly attached to the CE instead. + +--- +BT device CE AD + | --------- attach -----------> | | + | <-------- control ---------- | | + | | | + | --------- encoded audio ----> | | + | | ------- LPCM audio ---> | + | | <------ LPCM audio ---- | + | <-------- encoded audio ----- | +--- + +The following diagram depicts the data and control flow when the Bluetooth +device is directly attached to the CD. + +--- +BT device CD CE AD + | ---- attach -----------> | | | + | <--- control ---------- | | | + | | <---- scan ----- | | + | | ---- result ---> | | + | | <---- play ----- | | + | | | + | ---- encoded audio ----> | | + | | --------- LPCM audio ------> | + | | <-------- LPCM audio ------- | + | <--- encoded audio ----- | +--- +Multiple variations are possible on this model. + +### Audio transfer + +The audio amplifier hardware might be attached to the AD hardware, or +might be set up as a separate hardware amplifier attached to the +in-vehicle network. The CE needs to be able to send multiple streams of +decoded audio output to the AD, to be mixed with audio output from the +AD according to some prioritisation logic. + +The decoded audio streams should be in LPCM format, but other formats may +be negotiated by the domains using application specific APIs. + +Metadata can be sent alongside the audio, such as track names or timing information. + +Audio output needs predictable latency output, and for video conferencing it +needs low latency as well; conversely, some level of packet loss is +acceptable for audio traffic. However, the latency should not exceed +a certain amount of time in some specific cases: + + - Voice recognition systems provided through phone integration require + that the maximum latency of the audio buffer from the time it gets + captured by the microphone controlled by the AD to the time it gets + delivered to the phone attached to the CE domain must not exceed 35ms. + + - Text-to-speech systems provided through phone integration require that + the maximum latency of the audio buffer from the time it is received + by the CE domain from the attached phone to the time it gets played + back on the speakers attached to the AD must not exceed 35ms. + + - The total round-trip time must not exceed 275ms when the phone is attached + to the CE domain through a wired transports (for instance, USB CDC-NCM as + used by CarPlay or the Android Open Accessory Protocol) and 415ms on wireless + transports (WiFi in particular, Bluetooth A2DP is not recommended in this case). + + - Bluetooth SCO can be used when there is a latency constraint. It will be + lower quality, but the transfer time over the air is guaranteed. The whole audio + chain needs to satisfy the latency condition though. This is why in some + setup, the Bluetooth audio is routed directly to the AD amplifier. + When this is the case, an API to enable this link is provided by the domain + that owns the Bluetooth hardware. It can be the AD, or the CD embedding + a Bluetooth stack. The API calls would be issued by the CE domain. + +### Video decoding + +There might be a specific hardware video decoder attached to the AD +hardware, which the CE operating system wishes to use for offloading +decoding of trusted or untrusted video content. This is high bandwidth, +but means that the output from the video decoder could potentially be +directed straight onto a surface on the screen. + +(See [][Appendix: Audio and video decoding] for a discussion of options for video and audio +decoding.) + +#### Video or audio decoder bugs + +The CE has a software video or audio decoder for a particular video or +audio codec, and a security critical bug is found in this decoder, +which could allow malicious video or audio content to gain arbitrary +code execution privileges when it’s decoded. An update for the Apertis +operating system is released which fixes this bug, and users need to +apply it to their vehicles. To reduce the window of opportunity for +exploitation, this update has to be applied by the vehicle owner, rather +than taking the vehicle into a garage (which could take weeks). + + +> For example, like the series of exploitable bugs which affected the +> ‘secure’ media decoding library on Android in 2015, +> [*https://en.wikipedia.org/wiki/Stagefright\_%28bug%29*](https://en.wikipedia.org/wiki/Stagefright_\(bug\)) + +(*Note: This means we cannot securely support decoding untrusted video +or audio content in the AD, due to its slow software update cycle, +unless we use a* hardware *video decoder which is specifically designed +to cope with malicious inputs.*) + +### Streaming media + +The media player backend on the CE accesses local files or internet streams and +sends the streams to the Media Player HMI running in the AD. The CE might be able +to perform demuxing, decoding or at least partly verifying the streams. + +The AD might accept fully decoded streams, but the media file or stream is +usually encoded and multiplexed. In some cases, the multiplexed stream can have +synchronization sensitive metadata like subtitles. Therefore, if demuxing and +decoding are performed in different domains, the AD should support multiple +channels and mix the streams with time synchronization information. + +It is also possible that the AD sends the stream to the CE. For example, in the +case of Internet phone applications, the CE provides the HMI and needs to be able +to capture video and audio streams from the AD, before encoding and multiplexing +them on the CE. + +When handling data streams that don't need strict synchronization, the bulk data +transfer mechanism is recommended. For example, sharing still pictures does not +require real time processing so it is not suited for the streaming media mechanism. + +### Downloads of firmware updates + +An OTA update agent in the Connectivity domain downloads or retrieves from +an attached USB stick firmware images as large as 20GB each and needs to +share them with the Automotive domain where the FOTA backend can flash the +attached devices. + +Since firmware are very large, storing them twice should be avoided as the +available space may not be sufficient to do so. + +### Offline and online map data + +An offline map agent in the Connectivity domain downloads map data for offline +usage by the navigation system running in the Automotive domain. + +Conversely, an online map agent in the Connectivity domain handles requests +from the Automotive domain for map tiles to download. + +### Phonebook integration + +A phonebook agent in the Connectivity domain retrieves approximately 500 +256×256px profile pictures, validates and re-encodes them to PNG and makes them +available to the Automotive domain, possibly using an uncompressed zip file +instead of sharing 500 files. + +### Tinkering vehicle owner on the network + +The owner of a vehicle containing an Apertis device likes to tinker with +it, and is probing and injecting signals on the connection between the +AD and CE, or even replacing the CE completely with a device under their +control. They should not be able to make the automotive domain do +anything outside its normal operating range; for example, uncontrolled +acceleration, or causing services in the domain to crash or shut down. + +The tampering must be detectable by the vendor when the vehicle is +serviced or investigated after an accident. + +### Tinkering vehicle owner on the boards + +The owner of a vehicle containing an Apertis device likes to tinker with +it, and has gained access to the bootloaders and storage for both the AD +and CE boards. They have managed to add some custom software to the CE +image, which is now sending messages to the AD which it does not expect. +Or vice-versa. The domain receiving the messages must not crash, must +ignore invalid messages, and must not cause unsafe vehicle behaviour. + +The tampering must be detectable by the vendor when the vehicle is +serviced or investigated after an accident. + +(*Note that secure bootloading itself is a separate topic.*) + +### Support multiple AD operating systems + +The OEM for a vehicle wants to choose the operating system used in the +AD — for example, it might be GENIVI Linux, or QNX, or something else. +There is limited opportunity to modify this operating system to +implement Apertis-specific features. Whichever CE or CD system is installed +needs to interface to it. Each AD operating system may expose its +underlying hardware and services with a variety of different +non-standardised APIs which use push- and pull-style APIs for +transferring data. The OEM wishes to be provided with an inter-domain +communication library to integrate into their choice of AD operating +system, which will provide all the functionality necessary to +communicate with Apertis as the CE or CD operating system. + +### Before-market upgrades + +The OEM for a vehicle has chosen a specific version of an operating +system for their AD, and has initially released their vehicle with +Apertis 17.09 on another domain, such as CE and/or CD. For the latest +incremental version of this vehicle, they want to upgrade the other +domain to use Apertis 18.06. The OS in the AD cannot be changed, due +to having stricter stability and testing requirements than the other +domains. + +### After-market upgrades + +A user has bought a vehicle which runs Apertis 17.09 in its CE. Apertis +18.06 is released by their car vendor, and their garage offers it as an +upgrade to the user as part of their next car service. The garage +performs this software upgrade to the CE, without having to touch the +AD. It verifies that the system is operational, and returns the car to +the user, who now has access to all the new features in Apertis 18.06 +which are supported by their vehicle’s hardware. + +### Testability + +When developing a new vehicle, an OEM wants to iterate quickly on +changes to the CE, but also wants to test them thoroughly for +compatibility against a specific AD version, to ensure that the two +domains will work together. They want this testing to include a number +of valid and invalid conversations between the CE and AD, to ensure that +the two domains implement error handling (and hence a large part of +their security) correctly. + +### Malicious CE + +Somehow, a third party application installed onto the CE manages to +compromise a system service and gain arbitrary code execution privileges +in the CE. It uses these privileges to send malicious messages to the +AD. From the user’s point of view, this could result in a loss of IVI +functionality, and unexpected behaviour from vehicle actuators, but must +not result in loss of control of the vehicle. + +### Malicious CD + +Recent protocol failures have been discovered that allowed an attacker to +take control of a device remotely. To mitigate this, the network management stack has been moved to a Connectivity Domain. +The impact of those attacks must be minimized. While the CD functionnality +can be degraded, it must not result in loss of control of the vehicle. + +### After-market upgrade of a domain + +A user has bought a vehicle containing a low-end Apertis device. They +wish to upgrade to a more fully-featured Apertis device, and this +hardware upgrade is offered by their garage. The garage performs the +upgrade, which replaces the existing CE hardware with a new separate CE +board. If the existing hardware combined the AD and CE on a single board +or virtualised processor, the entire board is replaced with two new, +separate boards, one for each domain (though as this is a complex +operation, some garages or vendors might not offer it). If the existing +hardware already had separate boards for the two domains, only the CE +board is upgraded — this may be a service offered by all garages. + +### Power cycle independence of domains (CE down) + +Due to a bug, the CE crashes. The AD must not crash, and must continue +to function safely. It may display an error message to the user, and the +user may lose unsaved data. Once the CE restarts, the AD should +reconnect to it and reestablish a normal user interface. The CE should +reboot quickly and the cross-domain state be restored as much as +reasonable once restarted. + +Any partially-complete inter-domain communications must error out rather +than remaining unanswered indefinitely. + +The same situation applies if both domains are booting simultaneously, +but the CE is slower to boot than the AD, for example — the AD will be +up before the CE, and hence must deal with not being able to communicate +with it. See also [][Plug-and-play CE device]. + +### Power cycle independence of domains (AD down, single screen) + +On a system where the AD and CE are sharing a single screen, if the AD +crashes, the CE must not crash, and may gracefully shut down, and only +restart once the AD has finished rebooting. The AD should reboot quickly +and the cross-domain state be restored as much as reasonable once +restarted + +Any partially-complete inter-domain communications must error out rather +than remaining unanswered indefinitely. + +The same situation applies if both domains are booting simultaneously, +but the AD is slower to boot than the CE, for example — the CE will be +up before the AD, and hence must deal with not being able to communicate +with it. See also [][Plug-and-play CE device]. + +### Power cycle independence of domains (AD down, multiple screens) + +On a system with multiple output screens, if the AD crashes, the CE must +not crash, and should continue to run on all its screens, as another +user may be using the CE (without requiring any functionality from the +AD) on one of the screens. Once the AD restarts, the CE should reconnect +to it and reestablish a normal user interface on all screens. The AD +should reboot quickly and the cross-domain state be restored as much as +reasonable once restarted. + +Any partially-complete inter-domain communications must error out rather +than remaining unanswered indefinitely. + +The same situation applies if both domains are booting simultaneously, +but the AD is slower to boot than the CE, for example — the CE will be +up before the AD, and hence must deal with not being able to communicate +with it. See also [][Plug-and-play CE device]. + +### Temporary communications problem + +There is a temporary communications problem between a service on the AD +and its counterpart on the CE. Either: + + - The service (on the AD or CE) has crashed. + + - There is a problem with the physical connection between the domains, + such as dropped packets due to congestion; but both domains are + still running fine. + + - The entire domain or its inter-domain communications service has + crashed. + +The different situations can be detected by the parts of the stack which +are still working + +If a service has crashed, the inter-domain communication service should +return an appropriate error code to the other domain, which could +propagate the error to a calling application, or wait for the other +domain to restart that service and try again. + +If there is packet loss, the reliability in the inter-domain +communication protocol should cause the lost packets to be re-sent. +Services should wait for that to happen. If the communications problem +continues longer than a timeout, the domains must assume that each other +have crashed and behave accordingly. + +If a domain has crashed, the other domain must wait for it to be +restarted via its watchdog, as in [][Power cycle independence of domains (CE down)]. + +In all cases, the domain which is still running must not shut down or +enter a ‘paused’ state, as that would allow denial of service attacks. + +### New version of AD software + +An OEM has released a vehicle with version A of their AD operating +system, and version 15.06 of Apertis running in the CE. For the next +minor update to their vehicle, the OEM has made a number of changes to +the underlying AD software, but not to its external interfaces. They +wish to keep the same version of Apertis running in the CE and release +the vehicle using this version B of their AD operating system, and +version 15.06 of Apertis. + +### New version of AD interfaces + +An OEM has released a vehicle with version A of their AD operating +system, and version 15.06 of Apertis running in the CE. For the next +minor update to their vehicle, the OEM has made a number of changes to +the underlying AD software, and has changed a few of its external +interfaces and exposed a few more vehicle-specific features in new +interfaces. They want to make appropriate modifications to Apertis to +align it with these changed interfaces, but do not wish to make major +modifications to Apertis, and wish to (broadly) stick with version +15.06. They will release the vehicle using this version B of their AD +operating system, and a tweaked version 15.06 of Apertis. + +In other words, this scenario applies only when the OEM has updated the +AD, and wants to make a corresponding update to the CE. For the reverse +scenario where the CE has been upgraded, it is required that the AD does +not need to be updated: see [][Plug-and-play CE device] and [][After market CE upgrades]. + +### Unsupported AD interfaces + +An OEM uses an AD operating system which exposes a large number of +interfaces to various esoteric automotive components. Only a few of +these components are currently supported by Apertis version A, which +they are running in their CE. Apertis version B supports some more of +these components, and exposes them in its SDK APIs. The OEM wishes to +release a new version of the same vehicle, keeping the same version of +the AD operating system, but using version B of Apertis and exposing the +now-supported components in the SDK APIs. + +However, some of the other components which are exposed by the AD +operating system in its inter-domain interface cannot be securely +supported by Apertis (for example, they may allow unrestricted write +access to the in-vehicle network). These should not be accessible by the +SDK APIs at any time. + +### Contacts sharing + +A vehicle maintains an address book in its AD operating system, which +stores some of the user’s contacts on a removable SD card. The user +interface, run by the CE, needs to be able to display and modify these +contacts in the Apertis address book application. + +### Protocol compatibility + +An older vehicle, using an old version A of some AD operating system was +using a corresponding version A of Apertis in its CE. The CE operating +system is upgraded to a recent version of Apertis, version B, by the +garage when the vehicle is taken in for a service. This version of +Apertis uses a much more recent version of the underlying software for +the inter-domain communication protocol. It needs to continue to work +with the old version A of the AD operating system, which is running a +much older version of the protocol software. + +#### kdbus protocol compatibility + +If, for example, the inter-domain communication protocol is implemented +using dbus-daemon in version A of the AD operating system, and in the +corresponding version A of Apertis; and version B of Apertis uses kdbus +instead of dbus-daemon, the two OSs must still communicate successfully. + +### Navigation system + +A proprietary navigation system is running on the AD, with full access +to the vehicle’s navigation hardware, including inertial sensors and a +GPS receiver. A tour application on the CE wishes to use location-based +services, reading the vehicle’s location from the navigation system on +the AD, then requesting to the navigation service to set its destination +to a new location for the next place in the tour. It sends a stream of +points of interest to the navigation system to display on the map while +the driver is navigating. This stream is not high bandwidth; neither are +the location updates from the GPS. + +### Marshalling resource usage + +The ‘proxy’ software on either side of the inter-domain connection which +handles the low-level communication link is the first software in a +domain to handle malicious input. If malicious input is sent to a domain +with the intent of causing a denial of service in that software, the +rest of the software in the domain should be unaffected, and should +treat the connection as timing out or compromised. The behaviour of the +proxy software should be confined so that it cannot use excess resources +in the domain and hence extend the denial of service attack to the whole +domain. + +### Feedback for malicious applications + +If an application uses SDK APIs incorrectly (for example, by providing +parameters which are outside valid ranges), it may be reported to the +Apertis store as a ‘misbehaving application’ and scheduled for further +investigation and possible removal from the Apertis store. Similarly if +the inter-domain communication APIs are used incorrectly (for example, +if the AD returns an error stating that input validation checks have +failed for an API call). + +This could also result in an application being blacklisted by the CE’s +application manager, disallowing it from running in future until it is +updated from the Apertis store. + +### Compromised CE with delayed fix + +An attacker has somehow completely compromised the CE operating system, +and has root access to it. It will take the OEM a few weeks to produce, +test and distribute a fix for the exploit used by the attacker, but +vehicle owners would like to continue to use their vehicles, with +reduced functionality (no CE domain) in the meantime, because the attack +has not compromised the AD. The OEM has provided them with an +authenticated method of informing the AD to shut down the CE and keep it +shut down until an authenticated update has been applied and has fixed +the exploit and removed the attacker from the CE (probably by +overwriting the entire OS with a fresh copy). This update can only be +applied at a garage, but in order to allow speedy deployment, the user +can switch the AD to this stand-alone mode themselves, using a trusted +input path to the AD. + +### Denial of service through flooding + +A speedometer application bundle constantly requests vehicle speed +information from the AD. Hundreds of requests are made per second. The +AD ensures this does not affect overall system performance, potentially +at the cost of its responsiveness to the speedometer application’s +requests. + +*(Note: This assumes that the corresponding denial of service rate +limiting which is implemented in the SDK API used by the speedometer +application has somehow failed or been bypassed. In reality, all SDK +APIs are also responsible for implementing their own rate limiting as a +first level of protection against denial of service attacks.)* + +### Malicious CE UI + +An attacker has somehow completely compromised the CE operating system, +and has root access to it. They can display whatever they like on the +graphics output from the CE, which is shared with that from the AD on a +single screen. The attacker tries to replicate the AD UI on the CE’s +output and trick the user into entering personal data or security +credentials in this faked UI, believing it to be the actual AD UI. There +should be a way for the user to determine whether they are inputting +details via a trusted path to the AD. + +### Plug-and-play CE device + +In a particular vehicle, the CE device can be unplugged from the +dashboard by the user, and passed around the car so that, for example, a +rear seat passenger could play a game. This disconnects it from the AD, +but it should continue to function with some features (such as Wi-Fi or +Bluetooth) disabled until it is reconnected. Once reconnected to the +dashboard it should reestablish its connections. See also, [][Power cycle independence of domains (CE down)], +[][Power cycle independence of domains (AD down, single screen)], +[][Power cycle independence of domains (AD down, multiple screens)] + +*(Note: This is a much lower priority than other setups, but should +still be considered as part of the overall design, even if the code for +it will be implemented as a later phase.)* + +### Connecting an SDK to a development vehicle + +A developer is running the SDK as a standalone CE system in a virtual +environment on a laptop. They connect the laptop to the AD physically +installed in a development car using an Ethernet cable, and expect to +receive sensor data from the car, using the sensors and actuators SDK +API, which was previously returning mock results from the standalone +system. + +#### Connecting an SDK to a production vehicle + +The developer wonders what would happen if they tried connecting their +SDK laptop to the AD in a production vehicle. They try this, and nothing +happens — they cannot get sensor data out of the vehicle, nor use any of +its other APIs. + +## Security model + +See the [Security concept design] for general terminology including +the definitions used for *integrity*, *availability*, *confidentiality* +and *trust*. + +### Attackers + +#### Vehicle’s owner + +The vehicle’s owner may be an attacker. They have physical access to the +vehicle, including its in-vehicle network, the physical inter-domain +communications link, and the board or boards which the automotive domain +(AD) and consumer-electronics domain (CE) are on. We assume they do not +have the capabilities to perform invasive attacks on silicon on the +boards. Specifically, this means that in a virtualised setup where the +AD and CE are run as separate virtual machines on the same CPU, we +assume the attacker cannot read or modify the inter-domain +communications link between them. + +However, we do assume that they can perform semi-invasive or +non-invasive [attacks][ucam-cl-tr-630] on silicon on the boards. This means that they +could (with difficulty) extract encryption keys from a secure key store +on the board. A secure key store may be provided by the Secure Boot +design, but may not be present due to hardware limitations — if so, +the vehicle’s owner will be able to extract encryption keys from the +device more easily. + +> As of February 2016, the Secure Boot design is still forthcoming + +The vehicle’s owner may wish to attack their vehicle in order to get +access to licenced content which they would otherwise have to pay +for. + +> See the [Conditional Access design] + +We assume they do not want to take control of the vehicle, or +to gain arbitrary code execution privileges — they can drive the vehicle +normally, or develop and choose to install their own application bundle +for this. + +#### Passenger + +The passenger is a special kind of third party attacker ([][Third parties]), +who additionally has access to the in-vehicle network. This may be +possible if, for example, the Apertis device in the vehicle is removable +so it can be passed to a passenger, exposing a connector behind it. + +The passenger may be trying to access confidential information belonging +to the vehicle owner (if a multi-user system is in use). + +#### Third parties + +Any third party may be an attacker. We assume they have physical access +to the exterior of the vehicle, but not to anything under the bonnet, +including the in-vehicle network, the physical inter-domain +communications link, and the board or boards which the domains are on. +This means that all garage mechanics must be trusted. They do, however, +have access to all communications into and out of the vehicle, including +Bluetooth, 4G, GPS and Wi-Fi. + +We assume any third party attacker can develop and deploy applications, +and convince the owner of a vehicle to install them. These applications +are subject to the normal sandboxing applied to any application +installed on an Apertis system. These applications are also subject to +the normal Apertis store validation procedures, but we assume that a +certain proportion of malicious applications may get past these +procedures temporarily, before being discovered and removed from the +store. + +We assume that a third party attacker does not have access to the +Apertis store servers. This means that all staff who have access to them +must be trusted. + +A third party attacker may be trying to: + + - Access confidential information belonging to the vehicle owner. + + - Compromise the integrity of the vehicle’s control system (the + automotive domain). For example, to trigger unintended acceleration + or to change the radio channel to spook the driver. + + - Compromise the integrity of the CE domain to, for example, make it + part of a botnet, or cause it to call premium rate numbers owned by + the attacker to generate money. + + - Compromise the availability of the vehicle’s control system (the + automotive domain) to bring the vehicle to a halt. + + - Compromise the availability of the vehicle’s infotainment system + (the CE domain) to cause a nuisance to the driver or passengers. + + - Compromise the confidentiality of the device key (see the + [Conditional Access design]) in order to extract licenced content + (for example, music) from application bundles. + +#### Trusted dealer + +As above, all authorized vehicle dealers, garages or other sale/repair +locations have to be trusted, as they have more unsupervised access to +the vehicle’s hardware, and more capabilities, than the vehicle owner, +passenger or a third party. + +### Security domains + + - Automotive domain + + - There may be security sub-domains within the automotive domain, + but for the purposes of this design it is treated as a black box + + - Consumer-electronics domain: + + - Each application sandbox in the consumer-electronics domain + + - CE domain operating system (this includes all the daemons for + the SDK APIs — these are technically separate security domains, + but since they communicate only with sandboxes and the CE domain + proxy, this makes the model more complex for no analytical + advantage) + + - CE domain proxy for the inter-domain communication + + - Connectivity domain: + + - Connectivity domain handles the communication between AD and the + outer world. + + - Different protocol stacks. + + - CD domain proxy for communicating with AD + + - Other devices on the in-vehicle network, and the outside world + + - Hypervisor (if running as virtualised domains) + +### Security model + + - Domains must assume that the inter-domain communication link + has no confidentiality or integrity, and is controlled by an + attacker (a man in the middle with the ability to modify traffic) + + - This means they must not trust any traffic from other devices on + the network + + - The AD, CD and CE operating systems must assume all input from external + sources (Wi-Fi, Bluetooth, GPS, 4G, etc.) is malicious + + - The CE operating system may assume all API calls from the AD (as + proxied by the CE proxy) are *not* controlled by an attacker, + assuming they have come over an authenticated channel which + guarantees integrity between the AD and CE proxy; in other words, + the AD must not deny confidentiality or integrity to the CE + + - The AD may deny availability to the CE operating system, by closing + the inter-domain link in response to the user disabling the CE while + waiting for a critical security update + + - The AD must assume all API calls from the CE are malicious, in case + the CE has been compromised + + - The CE must assume that all input and output from third party + applications in sandboxes is malicious, including all their API + calls + + - If a hypervisor is present: + + - The AD and CE operating systems may assume all control calls + from the hypervisor are *not* controlled by an attacker + + - The hypervisor must assume all input from the CE is malicious + + - The hypervisor may assume that all input from the AD is *not* + malicious + + - Note that, when combined with the fact that the AD cannot be + updated easily, this makes security bugs in the AD extremely + critical and extremely hard to fix + + - Tampering with any domain software must be detectable even if it + is not preventable (tamper evidence) + + - If one vehicle is attacked and compromised, the same effort must be + required to compromise other vehicles + +## Non-use-cases + +### Production CE domain used in multiple configurations + +A production CE domain operating system cannot be used in multiple +configurations, for example as both an operating system running on one +CPU of a two-CPU board shared with the automotive domain OS; and then as +an image running on a separate board connected to an in-vehicle network +with other devices connected. + +This requirement would mean that the inter-domain communications system +would have to support runtime reconfiguration, which would be a vector +for protocol-downgrade attacks while bringing no major benefits. An +attacker could try to trick the CE domain into believing it was in (for +example) a virtualised configuration when it wasn’t, which could +potentially disable its encryption, due to the assumption the domain +could make about its inter-domain communications link having inbuilt +confidentiality. + +## Requirements + +### Separated transport layer + +The transport layer for transmitting inter-domain communications between +the domains must be separated from the APIs being transported, in order +to allow for different physical links between the domains, with +different security properties. + +#### Transport to SDK APIs + +Support a configuration where the CE is running in a virtual machine +with the Apertis SDK, so the peer (which would normally be the AD) is a +mock AD daemon running against the SDK. + +See [][Standalone setup]. + +#### Transport over virtio + +Support a configuration where the CE and AD communicate over a virtio +link between two virtual machines under a hypervisor. + +See [][Basic virtualised setup]. + +#### Transport over a private Ethernet link + +Support a configuration where the CE and AD are on separate CPUs and +communicate over a point-to-point Ethernet link. + +See [][Separate CPUs setup], [][Separate boards setup]. + +#### Transport over a private Ethernet link to a development vehicle + +Support a configuration where the CE is running in an SDK on a laptop, +and the AD is running in a developer-mode Apertis device in a vehicle, +and the two communicate over a wider shared Ethernet. + +See [][Connecting an SDK to a development vehicle]. + +#### Transport over a shared Ethernet link + +Support a configuration where the CE and AD are on separate CPUs are are +both connected to some wider shared Ethernet. + +See [][Separate boards setup with other devices], [][Multiple CE domains setup]. + +#### Transport over Unix Domain Socket + +Support a configuration where AD and CE are on the same host running as +Linux containers and connected via UDS. The same transport can be used on +OEM deployments and on SDK environments. + +See [][Linux container setup], [][Multiple CE domains setup]. + +### Message integrity and confidentiality in transport layer + +Some of the possible physical links between domains do not guarantee +integrity or confidentiality of messages, so these must be implemented +in the software transport layer. + +See [][Separate CPUs setup], [][Separate boards setup], +[][Separate boards setup with other devices], [][Multiple CE domains setup], +[][Wi-Fi access]. + +### Reliability and error checking in transport layer + +Some of the possible physical links between domains do not guarantee +reliable or error-free transfer of messages, so these must be +implemented in the software transport layer. + +See [][Separate boards setup], [][Separate boards setup with other devices], +[][Multiple CE domains setup]. + +### Mutual authentication between domains + +An attacker may interpose on the inter-domain communications link and +attempt to impersonate the AD to the CE, or the CE to the AD. The +domains must mutually authenticate before accepting any messages from +each other. + +See [][Tinkering vehicle owner on the network]. + +### Separate authentication for developer and production mode devices + +A CE running in an SDK must be able to connect to and authenticate with +an AD running in a vehicle which is in a special ‘developer mode’. If +the same CE is connected to a production vehicle, it must not be able to +connect and authenticate. + +See [][Connecting an SDK to a development vehicle], +[][Connecting an SDK to a production vehicle]. + +### Individually addressed domains + +In order to support multiple CE domains using the same automotive +domain, each domain (consumer–electronics and automotive) must be +individually addressable. The system must not assume that there are only +two domains in the network. + +See [][Multiple CE domains setup]. + +### Traffic control for latency + +In order to support delivery of touchscreen events with low latency (so +that UI responsiveness is not perceptibly slow for the user), the system +must guarantee a low latency for all communications, or provide a +traffic control system to allow certain messages (for example, +touchscreen messages) to have a guaranteed latency. + +See [][Touchscreen events]. + +### Traffic control for bandwidth + +In order to prevent some kinds of high bandwidth message from using all +the bandwidth provided by the physical link, the system must provide a +traffic control system to ensure all types of message have fair access +to bandwidth (where ‘fairness’ is measured according to some rigorous +definition). + +This may be implemented by separating ‘control’ and ‘data’ streams (see +sections 2.4 and 2.5), or by applying traffic control algorithms. + +See [][Wi-Fi access], [][Bluetooth access]. + +### Traffic control for frequency + +In order to prevent denial of service due to a service sending too many +messages at once (so the communication overheads of those messages start +to dominate bandwidth usage), the system must guarantee fair access to +enqueue messages. This is subtly different from fair access to +bandwidth: service A sending 100000 messages of 1KB per second and +service B sending 1 message of 100000KB per second have the same +bandwidth requirements; but if the inter-domain link saturates at +100000KB per second, some of the messages from service A must be delayed +or dropped as the messaging overheads exceed the bandwidth limit. + +See [][Denial of service through flooding]. + +### Separation of control and data streams + +Certain APIs will need to provide data and control streams separately, +with different latency and bandwidth requirements for both. The system +must support multiple streams; this may be via an explicit separation +between ‘control’ and ‘data’ streams, or by applying traffic control +algorithms. + +See [][Wi-Fi access], [][Bluetooth access], [][Audio transfer], [][Video decoding]. + +### No untrusted access to AD hardware + +The entire point of an inter-domain communication system is to isolate +the CE from direct access to sensitive hardware, such as vehicle +actuators or hardware with direct memory access (DMA) rights to the AD +CPU’s memory. This must apply equally to decoder hardware — decoders or +other hardware handling untrusted data from users must not be trusted by +the AD if the CE can send untrusted user data to it, unless it is +certified as a security boundary, able to handle malicious user input +without being exploited. + +Specifically, this means that hardware decoders must only access memory +which is accessible by the AD CPU via an input/output memory management +unit (IOMMU), which provides memory protection between the two, so that +the hardware decoder cannot access arbitrary parts of memory and proxy +that access to a malicious or compromised application in the CE. + +Note that it is not possible to check audio or video content for +‘badness’ before sending it to a decoder, as that entails doing the +full decoding process anyway. + +See [][Audio transfer], [][Video decoding], [][Video or audio decoder bugs], +[][Connecting an SDK to a production vehicle]. + +### Trusted path for users to update the CE operating system + +There must exist a trusted path from the user to the system updater in +the CE, or to a component in the AD which will update the CE. The user +must always have access to this update system (it must always be +*available*). + +This trusted path may also be used by garages to upgrade the CE when +servicing a vehicle; or a different path may be used. + +See [][Video or audio decoder bugs], [][After market CE upgrades], +[][Malicious CE UI]. + +### Safety limits on AD APIs + +The automotive domain must apply suitable safety limits to all of its +APIs, which are enforced within the AD, so that even if a properly +authenticated and trusted CE makes an API call, it is ignored if the +call would make the AD do something unsafe. + +In this case, ‘safety’ is defined differently for each actuator or +combination of actuator settings, and will vary between AD +implementations. It might not be possible to detect all unsafe +situations (in the sense of an unsafe situation which could lead to an +accident). + +See [][Tinkering vehicle owner on the boards], [][Malicious CE]. + +### Rate limiting on control messages + +The inter-domain service in the CE and AD should impose rate limiting on +control messages coming from the CE, to avoid a compromised service in +the CE from using a denial of service attack to prevent other messages +being transmitted successfully. + +This should be in addition to rate limiting implemented in the SDK APIs +in the CE themselves, which are expected to be the first line of defence +against denial of service attacks. + +See [][Denial of service through flooding]. + +### Ignore unrecognised messages + +Both the CE and AD must ignore (and log warnings about) inter-domain +communication messages which they do not recognise. If the message +expects a reply, an error reply must be sent. The domains must not, for +example, shut down or crash when receiving an unrecognised message, as +that would lead to a denial of service vulnerability. + +See [][Tinkering vehicle owner on the boards], [][Malicious CE]. + +### Portable transport layer + +The transport layer must be portable to a variety of operating systems +and architectures, in order that it may be used on different AD +operating systems. This means, for example, that it must not depend on +features added to very recent versions of the Linux kernel, or must have +fallback implementations for them. + +See [][Support multiple AD operating systems]. + +### Support push mode and pull mode communications + +The CE must be able to use pull mode communications with the AD, where +it makes a method call and receives a reply; and push mode +communications, where the AD emits a signal for an event, and the CE +receives this. + +See [][Support multiple AD operating systems]. + +### OEM AD integration API + +In order to allow any OEM to connect their AD to the system, there must +be a well defined API which they connect their OEM-specific APIs for +vehicle functionality to, in order for that functionality to be exposed +over the inter-domain communication link. + +This API must support an implementation which uses the services in the +Apertis SDK. + +See [][Support multiple AD operating systems], [][Standalone setup]. + +### Flexibility in OEM AD integration API + +As the functionality exported by different ADs differs, the integration +API for connecting it to the inter-domain communication system must be a +general one — it must not require certain functionality or data types, +and must support functionality which was not initially expected, or +which is not currently supported by any CE. This functionality should be +exposed on the inter-domain communications link, in case future versions +of the CE can take advantage of it. + +See [][Support multiple AD operating systems], [][Before market CE upgrades], +[][After market CE upgrades], [][New version of AD software], [][New version of AD interfaces]. + +### Inflexibility in OEM AD integration API + +The OEM AD integration API must not allow access to arbitrary services +or APIs on the AD. It must only allow access to the services and APIs +explicitly exposed by the OEM in their use of the integration API. + +See [][Unsupported AD interfaces]. + +### Service discovery + +Domains should be able to detect where specific services are hosted in +case of multiple CE domains. If a service is moved from one CE domain +to another CE domain, other domains should not require any reconfiguration. +CE domains should not be able to spoof services that are meant to be +provided by the AD. + +### Stability in inter-domain communications protocol + +As the versions of the AD and CE change at different rates, the +inter-domain communications protocol must be well defined and stable — +it must not change incompatibly between one version of the CE and the +next, for example. + +If the protocol uses versioning to add new features, both domains must +support protocol version negotiation to find a version which is +supported if the latest one is not. + +See [][Before market CE upgrades], [][After market CE upgrades], +[][New version of AD software], [][Unsupported AD interfaces], [][Protocol compatibility]. + +### Testability of protocols + +All IPC links in the inter-domain communications system must be testable +individually, without requiring the other parts of the system to be +running. For example, the link between applications and SDK API services +must be testable without running an automotive domain; the link between +SDK API services and the inter-domain interface at the boundary of the +CE domain must be testable without running an automotive domain; etc. + +See [][Testability], [][New version of AD software], [][Unsupported AD interfaces]. + +### Testability of protocol parsers and writers + +All protocol parsers and writers in the inter-domain communications +system must be testable individually, using unit tests and test vectors +which cover all facets of the protocol. These tests must include +negative tests — checks that invalid input is correctly rejected. For +example, if a protocol requires a certificate to authenticate a peer, a +test must be included which attempts a connection with different types +of invalid certificate. + +See [][Testability], [][New version of AD software], [][Unsupported AD interfaces]. + +### Testability of processes + +The code implementing all processes in the inter-domain communications +system must be testable individually, without having to run each process +as a subprocess in a test harness (because this makes testing slower and +error prone). This means implementing each process as a library, with a +well defined and documented API, and then using that library in a +trivial wrapper program which hooks it up to input and output streams +and accepts command line arguments. + +See [][Testability], [][New version of AD software], [][Unsupported AD interfaces]. + +### CE system services separated from transport layer + +There must be a trust boundary between each service on the CE which has +access to the inter-domain communication link, and the service which +provides access to the inter-domain communications link itself. The +inter-domain service should validate that messages from a service are +related to that service (for example, by having a whitelist of types of +message which each service can send). + +This limits the potential for escalation if service A is exploited — +then the attacker can only use the inter-domain service to impersonate +A, rather than to impersonate all services in the CE. It also allows the +resource usage of the inter-domain service to be limited, to limit the +impact of a denial of service attack on it. + +See [][Malicious CE], [][Marshalling resource usage]. + +### No dependency on CE specific hardware + +As the CE hardware may be upgraded by a garage at some point, the +inter-domain communications should not depend on specific identifiers in +this hardware, such as an embedded cryptographic key. Such keys may be +used, but the AD should accept multiple keys (for example, all keys +signed by some overall key provided by Apertis to all OEMs), rather than +only accepting the specific key from the hardware it was originally run +against. + +This requirement may also be satisfied by including provisions for +updating the copy of a key in the AD if such a dependency on a specific +CE key is a sensible implementation choice. + +See [][After market upgrade of a domain]. + +### Immediate error response if service on peer is unavailable + +If a service on the peer has crashed or is unresponsive, but the peer +itself (including its inter-domain communications link) is still +responsive, that peer should return an error to the other domain, which +should propagate it to any caller of SDK APIs which use the failing +service. An error response must be returned, otherwise the caller will +time out. + +See [][Power cycle independence of domains (CE down)], +[][Power cycle independence of domains (AD down, single screen)], +[][Power cycle independence of domains (AD down, multiple screens)], +[][Plug-and-play CE device] + +### Immediate error response if peer is unavailable + +If the peer has crashed, or is not currently connected to the physical +inter-domain communications link (either because it has been unplugged +or due to a fault), the other peer must generate a local error response +in the inter-domain service and return that to any caller of SDK APIs +which require inter-domain communications. An error response must be +returned, otherwise the caller will time out. + +See [][Power cycle independence of domains (CE down)], +[][Power cycle independence of domains (AD down, single screen)], +[][Power cycle independence of domains (AD down, multiple screens)], +[][Plug-and-play CE device] + +### Timeout error response if peer does not respond + +If the peer is unresponsive to a particular inter-domain message, the +other peer must generate a local error response in the inter-domain +service and return that to the caller of the SDK API which required +inter-domain communications. An error response must be returned, +otherwise the caller will wait for a response indefinitely (or have to +implement its own timeout logic, which would be redundant). + +See [][Power cycle independence of domains (CE down)], +[][Power cycle independence of domains (AD down, single screen)], +[][Power cycle independence of domains (AD down, multiple screens)], +[][Plug-and-play CE device] + +### All inter-domain communications APIs are asynchronous + +As inter-domain communications may have some latency, or may time out +after a number of seconds, all SDK APIs which require inter-domain +communications must be asynchronous, in the [GLib sense][GAsyncResult]: the call +must be started, a handler for its response added to the caller’s main +loop, and the caller must continue with other tasks until the response +arrives from the other domain. + +This encourages UIs to be written to not block on SDK API calls which +might take multiple seconds to complete, as during that time, the UI +would not be redrawn at all, and hence would appear to ‘freeze’. + +See [][Temporary communications problem]. + +### Reconnect to peer as soon as it is available + +If a domain has crashed and restarted, or was disconnected from the +inter-domain communications link and then reconnected, the domain must +reconnect to its peer as soon as the peer can be found on the network. +If, for example, both domains had crashed, this may involve waiting for +the peer to connect to the network itself. + +See [][Plug-and-play CE device]. + +### External domain watchdog + +Both domains must be connected to an external watchdog device which will +restart them if they crash and fail to restart themselves. + +The watchdog must be external, rather than being the other domain, in +case both domains crash at the same time. + +See [][Power cycle independence of domains (CE down)], +[][Power cycle independence of domains (AD down, single screen)], +[][Power cycle independence of domains (AD down, multiple screens)]. + +### Reporting system for malicious applications + +There should exist a trusted path from the application launcher in the +CE to the Apertis store to allow the launcher to provide feedback about +applications which are detected to have done ‘malicious’ things, such as +called an SDK API with parameters which are obviously out of range. + +If such a path exists, the inter-domain service in the CE must be able +to detect error responses from the AD which indicate that malicious +behaviour has been detected and rejected, and must be able to forward +those notifications to the reporting system. + +See [][Feedback for malicious applications]. + +### Ability to disable the consumer–electronics domain + +There must exist a trusted path to a setting in the AD to allow the +vehicle owner to disable the CE because it has been compromised, pending +taking the vehicle to a trusted dealer to install an update. + +As well as preventing booting the CE, this must disable all inter-domain +communications from within the inter-domain service in the AD. + +See [][Compromised CE with delayed fix]. + +### Tamper evidence + +If the CE or AD, or communications between them are tampered with by an +attacker, it must be possible for an investigator (who is trusted by and +has access to tools provided by the OEM) to determine that the software +or hardware was modified — although it might not be possible for them to +determine *how* it was modified. This will allow for liability to be +attributed in the event of an accident or warranty claim. + +See [][Tinkering vehicle owner on the network], +[][Tinkering vehicle owner on the boards]. + +### No global keys in vehicles + +The security which protects the inter-domain communication system +(including any trusted boot security) must use unique keys for each +vehicle, and must not have a global key (one which is the same in all +vehicles) as a single point of failure. + +This means that if an attacker manages to compromise one vehicle, they +must not be able to learn anything (any keys) which would allow them to +compromise another vehicle with less effort. + +See [][Tinkering vehicle owner on the network], +[][Tinkering vehicle owner on the boards]. + +## Existing inter-domain communication systems + +As this is quite a unique problem, we know of no directly comparable +systems. More generally, this is an instance of a distributed system, +and hence similar in some respects to a number of existing remote +procedure call systems or distributed middleware systems. + +If comparisons with specific systems would be beneficial, they can be +included in a future revision of this document. + +**Open question**: Are there any relevant existing systems to compare +against? + +## Approach + +Based on the [above research][Existing domain communications system] and [][Requirements], we +recommend the following approach as an initial sketch of an inter-domain +communication system. + +### Overall architecture + +In the following figure, each box represents a process, and hence each connection +between them is a trust boundary. + + + +> Apertis IDC architecture. The ‘OEM specific’ APIs are also known as ‘native OEM APIs’; +> and the ‘OEM API’ is also known as the ‘Apertis automotive API’. +> For more information on the export and adapter layer, see [][Automotive domain export layer] +> and [][Consumer-electronics domain adapter layer]. + +APIs from the automotive domain are exported by an *export layer* +([][Automotive domain export layer]) as D-Bus objects on the inter-domain communications link. +This link runs a known version of the D-Bus protocol (and requires +backwards compatibility indefinitely) between an *inter-domain service* +process in each domain ([][Protocol library and inter-domain services]). The inter-domain service in the CE +domain sends and receives D-Bus messages for the objects exported by the +automotive domain, and proxies them to a private bus in the CE domain. +SDK services in the CE domain connect to this bus, and an *adapter +layer* [][Consumer-electronics domain adapter layer] +in each service converts the APIs from the +automotive domain to the SDK APIs used in the version of Apertis in use +in the CE domain. These SDK APIs are exported onto the normal D-Bus +session bus, to be used by applications ([][Flow for a given SDK API call]). + +The export layer and adapter layer provide abstraction of the APIs from +the automotive domain: the export layer converts them from C APIs, QNX +message passing, or however they are implemented in the automotive OS, +to a D-Bus API which is specific to that OEM, but which has stability +guarantees through use of API versioning ([][Interaction of the export and adapter layers]). +The adapter layer converts from this D-Bus API to the current version of the Apertis +SDK APIs. Both layers are OEM-specific. + +The use of the D-Bus protocol throughout the system means that between +the export layer and the adapter layer, message contents to not need to +be remarshalled — messages only need their headers to be changed before +they are forwarded. This should eliminate a common cause of poor +performance (remarshalling). + +High-bandwidth [][Data connections] are provided in +parallel with the *control connection* which runs this D-Bus protocol +([][Control protocol]). They use TCP, UDP or Unix sockets, and are opened between the two +inter-domain services on request. Applications and services must define +their own protocols for communicating over these links, which are +appropriate to the data being transferred (for example, audio data or a +Bluetooth file transfer). + +Authentication, confidentiality and integrity of all inter-domain +communications (the control connection and data connections) are +provided by using IPsec as the bottom layer of the protocol stack +([][Encryption]). The same protocol stack is used for all configurations +of the two domains (from a standalone CE domain through to multiple CE +domains on a shared network with an automotive domain), to ensure that +the same code path is used for all configurations and hence is widely +tested ([][Configuration designs]). + +Addressing and discovery of domains, before the initial connection +between them, is provided by IPv6 neighbour discovery ([][Traffic control]). + +Traffic control is implemented in the CE domain using standard Linux +kernel traffic control mechanisms, with the policy specified by the +inter-domain service (section 8.4). It is applied for the control +connection and for each data connection separately, as they are all +separate TCP or UDP connections. + +The only exception from the above is [][Linux container setup] +which uses Unix Domain Sockets as a trusted and reliable bottom transport layer +instead of IPsec. In this case, there is no need for traffic control. +Addressing and discovery of local domains in [][Linux container setup] is +based on common directories created and shared outside of the containers by +the container manager. + + + +> Responsibilities for areas of code in the IDC architecture + +### Security domains + +As process boundaries are the only way of enforcing trust boundaries, +each of these security domains corresponds to at least one separate +process in the system. + + - Inter-domain service in the automotive domain. We recommend that + this remains a separate security domain from the rest of the + services and software running in the AD. This allows it to be + isolated from other components to reduce the attack surface exposed + by the AD. + + - Rest of the automotive domain: as mentioned in [][Security domains], the + automotive domain is essentially a black box. + + - Each application sandbox in the consumer–electronics domain. + + - Inter-domain service in the consumer–electronics domain. + + - Each service for an SDK API in the consumer–electronics domain. The + trust boundaries between them may not be enforced strongly (as all + services in the consumer–electronics domain are considered as + trusted parts of the operating system), but their trust boundaries + with the inter-domain service should be enforced, and the + inter-domain service should consider them as potentially + compromised. + + - Other devices on the in-vehicle network, and the outside world. + + - Hypervisor (if running as virtualised domains). + +### Protocol design + +The protocol for communicating data between the domains has two +*planes*: the control plane, and the data plane. They have different +requirements, but both require addressing, routing, mutual +authentication of peers, confidentiality of data and integrity of data. +In addition, the control plane must have bi-directional, in-order +transmission, framing, reliability and error detection. Conversely, the +data plane must have multiplexing, and the ability to apply traffic +control to each of its connections ([][Traffic control]). + +The control plane is used for sending control data between the domains — +these are the method calls which form the majority of inter-domain +communications. They require low latency, and are low bandwidth. The +[control protocol][Control protocol] itself provides push and pull method +call semantics, and allows for new data connections ([][Data connections]) to +be opened. Only one control connection exists between a pair of domains, +and it is always connected. + +The data plane is used for high bandwidth data, such as video or audio +streams, or Wi-Fi, 4G or Bluetooth downloads. The latency requirements +are variable, but all connections are high bandwidth. The inter-domain +communication system provides a plain stream for each data plane +connection, and services must implement their own protocol on top which +is appropriate for the specific type of data being transmitted (for +example, audio or video streaming; or Wi-Fi downloads). Data connections +are created between two domains on demand, and are closed after use. + +#### IPsec versus TLS + +An important design decision is whether to use [IPsec] or [TLS] +(and DTLS) for providing the security properties of the inter-domain +connection. + +If IPsec is used (following figure), it forms the bottom layer of the protocol +hierarchy, and implements addressing, routing, mutual authentication, +confidentiality and integrity for *all* connections in the control and +data planes. + + + +> Protocol stacks for control and data planes if using IPsec. + +If TLS is used (Following figure), it forms the layer just below the application +protocols in the protocol hierarchy — the control plane would use a +single TLS over TCP connection; and the data plane would use multiple +TLS over TCP or DTLS over UDP connections. TLS (and hence DTLS — they +have the same security properties) implements mutual authentication, +confidentiality and integrity, but only for a single connection; each +new connection needs a new TLS session. + +The chief advantage of IPsec is its transparency: any protocol can be +tunnelled using it, without needing to know about the security +properties it has. However, to do this, IPsec needs to be supported by +both the AD and CE kernels. Some automotive operating systems may not +support IPsec (although, as a data point, QNX seems +to). + + + +> Protocol stacks for control and data planes if using TLS. + +A [2003 review of the IPsec protocol][crypto-eval] identified a number of +problems with it. However, since then, it has been updated by [RFC 4301], +[RFC 6040] and [RFC 7619]. These should be evaluated and the +overall protocol security determined. In contrast, the security of TLS +has been well studied, especially in recent years after the emergence of +various vulnerabilities in it. TLS has the advantage that it is a +smaller set of protocols than IPsec, and hence easier to study. + +**Open question**: What is the security of the IPsec protocol in its +current (2015) state? + +Performance-wise, TLS requires a handshake for each new connection, +which imposes connection latency of at least one round trip (assuming +use of [TLS session resumption][RFC 5077]) for each new connection (on top of +other latency such as the TCP handshake). It is not possible to use a +single TLS session and multiplex connections within it, as this puts the +protocol reliability (TCP retransmission) below the multiplexing in the +protocol stack, which makes the multiplexed connection prone to [head of +line blocking], which seriously impacts performance, and allows one +connection to perform a denial of service attack on all others it is +multiplexed with. IPsec has the advantage of not requiring this +handshake for each connection, which significantly reduces the latency +of creating new connections, but does not affect their overall bandwidth +once they have reached a steady state. + +**Open question**: What is the performance of TCP and UDP over IPsec, +TLS over TCP and DTLS over UDP on the Apertis reference hardware? + +Overall, we recommend using IPsec if it is expected to be supported by +all automotive domain operating systems which will be used with Apertis +systems. Otherwise, if an AD OS might not support IPsec, we recommend +using TLS over TCP and DTLS over UDP for *all* configurations. We do +*not* recommend providing a choice for OEMs between IPsec and TLS, as +this doubles the possible configurations (and hence testing) of a part +of the system which is both complex and security critical. + +The remainder of this document assumes that IPsec is chosen. Throughout, +please read ‘IPsec’ as meaning ‘the IPsec protocol stack or the TLS +protocol stack’. + +#### Configuration designs + +The physical links available between the domains differ between +configurations of the domains, as do their properties. For some +configurations ([][Standalone setup], [][Basic virtualised setup], +[][Linux container setup]) +confidentiality and integrity of the inter-domain communications +protocol are not strictly necessary, as the physical link itself cannot +be observed by an attacker. However, for the other configurations, these +two properties are important. + +Since the first two configurations are the ones which are typically used +for development, we suggest implementing confidentiality and integrity +for them anyway, regardless of the fact it’s not strictly necessary. +This avoids the situation where the code running on production +configurations is vastly different from that running on development +configurations. Such a situation often leads to inadequate testing of +the production code. + +This should be weighed against the potential performance gains from +eliminating encryption from those connections, and the potential gains +in debuggability (for the [][Standalone setup] and +[][Linux container setup]) by being able to inspect +network traffic without needing to extract the encryption key. + +**Open question**: What trade-off do we want between performance and +testability for the different transport layer configurations? + +###### Standalone setup + +IPsec running on a [loopback interface] to a service running in the +SDK which mocks up the inter-domain service running in the AD. The +security properties it provides are technically not needed, as the +standalone setup is for development and is ignored by the security +model. + +Even though there are only two peers communicating, they will both have +and use a full addressing scheme ([][Addressing and peer discovery]). + +###### Basic virtualised setup + +A virtio-net connection must be set up in the CE and AD virtual +guests, using a private network containing those two peers. If the AD +cannot be modified to enable a virtio-net connection, a normal +virtualised Ethernet connection must be used. + + +> Virtio-net is the name of the KVM paravirtualised network driver +> ([*http://www.linux-kvm.org/page/Virtio*](http://www.linux-kvm.org/page/Virtio)). +> Similar paravirtualised drivers exist for most hypervisors; so an +> appropriate one for the hypervisor should be used. For simplicity, +> this document will use ‘virtio-net’ to refer to them +> all. + +In either case, the transport layer will use IPsec between the two. The +security properties it provides are technically not needed for a +virtualised configuration, as the security model guarantees that the +hypervisor maintains confidentiality and integrity of the connection. + +Even though there are only two peers on the network, they will both have +and use a full addressing scheme ([][Addressing and peer discovery]). + +###### Separate CPUs setup + +A normal Ethernet connection must be used to connect the AD and CE on a +private network. IPsec will be used over this Ethernet link, providing +all the necessary transport layer properties. + +Even though there are only two peers on the network, they will both have +and use a full addressing scheme, described below. + +###### Separate boards setup + +Same as for the separate CPUs setup. + +###### Separate boards setup with other devices + +Same as for the separate CPUs setup. + +###### Multiple CE domains setup + +Same as for the separate CPUs setup. Each domain’s address must be +unique, and the use of addressing in this configuration becomes +important. + +###### Linux container setup + +The communication is based on Unix Domain Sockets (UDS) shared between +the counterpart domains; this means that a common directory must be shared +for each pair of communicating domains. This directory must be writable by +at least one container, such that its gateway layer or adapter layer can +create the named unix domain socket file and listen on it, and must be +readable on the other container, which will connect to the shared named unix +domain socket file. The dedicated shared directory for communication may +support space limits for writing and inodes creation, for example: dedicated +`tmpfs` mount or `btrfs` subvolume quota, to prevent denials of service due +to filesystem space exhaustion. + +The container manager is responsible for the actions below when each +container is started or stopped: + + - a shared storage space (a size-constrained `tmpfs` mount or `btrfs` + subvolume) must be defined for each pair of containers on the host system, + for instance `${IDC_HOST_DIR}/automotive-connectivity` for the link + connecting the `automotive` and `connectivity` domains + + - the shared storage must be mounted by the container manager with + read/write permissions on the first domain of the pair, for instance + as `${IDC_DIR}/connectivity` in the `automotive` domain + + - the same shared storage must be mounted by the container manager with read + permissions on the second domain of the pair, for instance as + `${IDC_DIR}/automotive` in the `connectivity` domain + + - when the container is stopped, the shared storage and mounts + associated with the container must be unmounted + +The variables `${IDC_HOST_DIR}` and `${IDC_DIR}` mentioned above represent the +paths where the shared spaces are mapped on the host and containers filesystems +respectively. By default, both variables `${IDC_HOST_DIR}` and `${IDC_DIR}` +are defined in a common manner as `/var/lib/idc/`. OEM or developer's +setup may require to redefine these paths for the customised environment. + +#### Addressing and peer discovery + +##### Network addressing and peer discovery + +Each domain will be identified by its IPv6 address, and domains will be +discovered using the IPv6 protocol’s secure [neighbour discovery] +protocol. As domains do not need to be human-addressable (indeed, +the users of the vehicle need never know that it has multiple domains +running in it), there is no need to use DNS or mDNS for addressing. + +The neighbour discovery protocol includes a feature called neighbour +unreachability detection, which should be used as one method of +determining that one of the domains has crashed. When a domain crashes, +the other domain should poll for its existence on the network at a +constant frequency (for example, at 2Hz) until it reappears at the same +address as before. This frequency of polling is a trade-off between not +flooding the network with connectivity checks, but also detecting +reappearance of the domain rapidly. + +When reconnecting to a restarted domain, the normal authentication +process should be followed, as if both domains were starting up +normally. There is no state to restore for the inter-domain link itself +but, for example, SDK services may wish to re-query the automotive +domain for the current vehicle state after reconnecting. They should do +this after receiving an error response from the AD for an inter-domain +communication which indicated that the other domain had crashed. Such +behaviour is up to the implementers of each SDK service, and is not +specified in this design. + +##### Container-based addressing and peer discovery + +Each container must be assigned an unique name on the filesystem +to be used as domain identifier for addressing and peer discovery purposes. + +The `${IDC_DIR}` directory in the container contains a directory entry +for each associated domain to be connected through the inter-domain +communication mechanism. As described in [][Linux container setup], +the container manager is responsible for mounting a dedicated shared space +to host the socket for the container pairs. + +The name of mount point for the shared directory in the container should be +the same as the name of counterpart peer. For example, to connect an +`automotive` and a `connectivity` domain, the shared space must be mounted +in the `automotive` container on the `${IDC_DIR}/connectivity/` path and must +be mounted in the `connectivity` container on the `${IDC_DIR}/automotive/` +path. + +On startup, each container in the pair must try to `unlink()` any stale file +in the shared spaces and then create a Unix Domain Socket named `socket` there. +Since the shared directory is mounted with write permissions only on a single +domain, the `unlink()` and `bind()` calls on the unix socket file will fail on +the other domain, which only has read permissions. + +Once it has removed any stale file and successfully created the socket, the +first container in the pair must then `listen()` on it: for instance +the `automotive` domain must listen on the `${IDC_DIR}/connectivity/socket` +unix socket. The second container in the pair must instead wait for the +`socket` file to be available and must connect to it as soon it is created: +for instance the `connectivity` must wait for the `${IDC_DIR}/automotive/socket` +file to appear and connect to it. + +#### Encryption + +The confidentiality, integrity and authentication of the inter-domain +communications link is provided by IPsec in transport mode for networked +setups, and by kernel-provided Unix Domain Sockets on +[container-based setups][Linux container setup]. + +**Open question**: What more detailed configuration options can we +specify for setting up IPsec? For example, disabling various optional +features which are not needed, to reduce the attack surface. What IKE +service should be used? + +The system should use an IPsec security policy which drops traffic +between the CE and AD unless IPsec is in use. The security policy should +not specify behaviour for communications with other peers. + +Each domain must have an X.509 certificate (essentially, a public and +private key pair), which are used for automatic keying for the IPsec +connections. The certificates installed in the automotive domain must be +signed by a certificate authority (CA) specific to the automotive domain +and possibly the OEM. The certificates installed in the CE domain must +be signed by a CA specific to the CE domain and possibly the OEM. + +A domain (automotive or CE) which is in developer mode must use a +certificate which is signed by a developer mode CA, not the production +mode CA. This allows a production mode domain to prevent connections +from a developer mode domain. + +See [][Appendix: Software versus hardware encryption] for a comparison of software and hardware encryption. + +In order to maintain confidentiality of the connection, the keys for the +IPsec connection must be kept confidential, which means they must be +stored in memory which is not accessible to an attacker who has physical +access to the system (see [][Tamper evidence and hardware encryption]); or they must be encrypted under a +key which is stored confidentially (a key-encrypting key, KEK). Such a +confidential key store should be provided by the Secure Boot +design — if available, confidentiality of the inter-domain +communications can be guaranteed. If not available, inter-domain +communications will not be confidential if an attacker can extract the +boot keys for the system and use them to extract the inter-domain +communications keys. + +> As of February 2016, the Secure Boot design is still +> forthcoming + +See section 8.15 for further discussion of the hardware base for +confidentiality and integrity of the system. + +**Open question**: A lot of business logic for control over OEM +licencing can be implemented by the choice of the CA hierarchy used by +the inter-domain communication system. What business logic should be +possible to implement? + +**Open question**: Consider key control, revocation, protocol +obsolescence, and various extensions for pinning keys and protocols. + +**Open question**: What can be done in the automotive domain to reduce +the possibility of exploits like [Heartbleed] affecting the +inter-domain communications link? This is a trade-off between the +stability of AD updates (high; rarely released) and the pace of IPsec +and TLS security research and updates and the need for crypto-agility +(fast). Heartbleed was a bug in a bad implementation of an optional and +not-very-useful TLS extension. + +#### Control protocol + +The control protocol provides push and pull method call semantics and a +type system for marshalling method call parameters and return values — +but it does not prescribe a specific set of APIs which it will +transport. It must be flexible in the set of APIs which it transports. + +We suggest using D-Bus over TCP as the control protocol, using a private +bus between the automotive domain and the consumer–electronics domain. +For multiple CE domain configurations, each +automotive—consumer–electronics domain pair would have its own +private bus. + +The transport should be implemented using D-Bus’ TCP [socket transport] +mechanism. Authentication, confidentiality and integrity are +provided by the underlying IPsec connection. D-Bus implements its own +datagram framing on top of the TCP stream. + +On this bus, APIs from the automotive domain would be exposed as +services; the CE domain can then call methods on those services, or +receive signals from them. + +D-Bus was chosen as it implements the necessary functionality, reuses a +lot of the technologies already in use in Apertis, is stable, and is +familiar to Apertis developers. Note that we suggest D-Bus the +*protocol*, not necessarily dbus-daemon the *message bus daemon* or +libdbus the reference *protocol library*. D-Bus the protocol provides: + + - Method calls (pull semantics) with exactly one reply, supporting + timeouts + + - Error responses + + - Signals (push semantics) + + - Properties + + - Strong type system + + - Introspection + +There are several important points here: introspection means that the +D-Bus services on the AD can send their API definitions to the CE at +runtime if needed, so that the CE does not have to have access to header +files (or similar) from the AD. It also means the API definition can +change without needing to recompile things — for example, an update to +the AD could expose new APIs to the CE without needing to update header +files on the CE. Finally, method calls support ‘in’ and ‘out’ parameters +(multiple return values) which allows for bi-directional communication +in the control protocol. + +**Open question**: How should the multiple CE configuration ([][Configuration designs] +interact with D-Bus signals? Can the adapter layer perform the +broadcast to all subscribers? + +The D-Bus protocol is stable, and has maintained backwards compatibility +with all previous versions since [2006][dbus-stability]. If changes to the D-Bus +protocol are introduced in future, they will be introduced as extensions +which are used optionally, if supported by both peers on the bus. Hence +backwards compatibility is maintained. + +#### Data connections + +If a service wishes to send high-bandwidth data between the domains, it +must open a new data connection. Data connections are created on demand, +and are subject to traffic control, so the AD may, for example, reject a +connection request or throttle its bandwidth in order to maintain +quality of service for existing connections. + +The inter-domain communication protocol provides two types of data +connection: TCP-like and UDP-like. These are implemented as TCP or UDP +connections between the two domains, running over IPsec. IPsec provides +the necessary authentication, confidentiality and integrity of the data; +TCP or UDP provide the multiplexing between connections (see the +IPsec protocol stacks figure in [][IPSec versus TLS]). +For [][Linux container setup] a Unix domain socket is used as the IDC link; +the local kernel provides the needed authentication, confidentiality +and integrity of the data. +Services must implement their own application-specific protocols on top +of the TCP or UDP connection they are provided. For example, a video +service may use a lossy synchronised audio/video protocol over UDP for +sending video data together with synchronised audio; while a download +service may use HTTP over TCP for sending downloads between domains. +(See [here][Appendix: Audio and video decoding] for a discussion of options for implementing video and +audio decoding.) Such protocols are not defined as part of this design — +they are the responsibility of the services themselves to design and +implement. + +Data connections are opened by sending a request to one of the +inter-domain services ([][Protocol library and inter-domain services]), specifying desired characteristics +for the connection, such as whether it should be TCP-like or UDP-like, +its bandwidth and latency requirements, etc. The connection will be +opened and a unique identifier and file descriptor for it returned to +the requesting service. This service must then send the identifier over +the control connection so that the corresponding service in the other +domain can request a file descriptor for the other end of the connection +from its inter-domain service. + +**Open question**: Could this be simplified by using D-Bus’ support for +file descriptor passing? D-Bus’ TCP transport currently explicitly does +not support file descriptor passing, so implementing it that way without +introducing incompatibilities requires planning. + +It is tempting to extend D-Bus’ support for file descriptor (FD) passing +so that it operates over TCP to provide these data connections. However, +that would effectively be a fork of the D-Bus protocol, which we do not +want to maintain as part of this system. Secondly, due to the way FD +passing works, with the peer passing an FD to the dbus-daemon and asking +for it to be forwarded — this would mean that the peer (i.e. an SDK or +OEM service) has the responsibility for opening the data connection +within the IPsec tunnel, which would be very complex. + +Instead, we recommend a custom API provided by the inter-domain service +which an SDK or OEM service can call to open a new data connection, +passing in the parameters for the connection (such as TCP/UDP, quality +of service requirements, etc.). The inter-domain service would +communicate over a private control API with the other inter-domain +service to open and authenticate the connection at both ends, and return +a file descriptor and cryptographic nonce (securely random value at +least 256 bits long) to the original SDK or OEM service. This service +can use that file descriptor as the data connection, and should pass the +nonce over its own control protocol to the corresponding OEM or SDK +service. This service should then pass the nonce to its inter-domain +service and will receive the file descriptor for the other end of the +data connection in reply. + +Both inter-domain services should retain their file descriptors (which +they have shared with the OEM and SDK services) for the data connection, +so that if the kill switch ([][Disabling the CE domain]) is enabled, they can call +shutdown() on the data connection to forcibly close it. + +The inter-domain services must reserve all well-known names starting +with `org.apertis.InterDomain` (for example, `org.apertis.InterDomain1` or +`org.apertis.InterDomain1.DataConnections`), and similarly all D-Bus +interface names. This means they must not allow these names to be used +as part of the OEM API shared between the export and adapter layers +([][Interaction of the export and adapter layers]). + +A data connection cannot exist without an associated control connection +(though one control connection may be associated with many data +connections). As data connections are opened and controlled through APIs +defined on the inter-domain services, there is no need for standard +network-style service discovery using protocols like [DNS-SD] or +[SSDP]. + +#### Time synchronization + +As a distributed system, the inter-domain services may require a shared clock +across the domains. Time synchronization is critical to correlate events +and this is specially important when playing audio and video streams, for example. +If those streams are decoded on the CE and needs to played by the AD, the AD +and the CE should agree on the meaning of the timestamps embedded in the streams. + +For the synchronization, there are two suitable protocols: + + - [NTP] is a well-known protocol to synchronise time among remote systems. + It provides millisecond or sub-millisecond accuracy over the Internet or + local area networks respectively; + + - [PTP] provides microsecond or sub-microsecond accuracy and is designed for + local area networks. + +In terms of latency calculation, both protocols satisfy the requirements, but we recommends +PTP for the following reasons: + + - NTP uses hierarchical time sources, whereas PTP has a simpler master/slave model. + That means any system that is even untrusted domain in a network is able to be taken + by the other CE domain as a NTP source; + + - PTP supports hardware assisted timestamps to improve accuracy. Under Linux, + the PTP hardware clock (PHC) subsystem is used to produce timestamps on + supported network devices. + +#### Audio streams + +To share audio streams [RTP] and its companion protocol [RTCP] are recommended +both on networked and container-based setups, for encoded and decoded streams. + +They provide jitter compensation, out-of-sequence handling and synchronization +across multiple different streams. + +In particular [multiplexed RTP/RCTP][Appendix: Multiplexing RTP and RTCP] +can be used to multiplex both protocols over the kind of data connections +described above. + +#### Decoded video streams + +A fully decoded video stream consumes large quantities of bandwith and sharing +it between domains using the same approach used by audio (RTP) can only work +for very small resolutions (see [][Memory bandwith usage on the i.MX6 Sabrelite] +for the bandwidth limitations on one of the platforms targeted by Apertis). + +If a domain sends uncompressed 1080p video stream at 25fps in YUV422 format to +another domain it requires just a bit more than 100MB/s for just the stream +transfer. This already makes it prohibitive on Gigabit Ethernet systems, which +have a theoretical maximum bandwith of 125MB/s, without including any framing +overhead. Even for local transfers this is a significant portion of the total +memory bandwidth, even more so if taking in account other activities including +the actual decoding and playback, plus the need for the same memory bandwidth +toward the GPU where the decoded frames need to be composed. + +To be able to handle 1080p video streams it is very important that zero-copy +mechanisms are used for the trasfer of frames, see +[][Appendix: Audio and video decoding] for further considerations about how a +protocol can be defined to match such expectations. + +#### Bulk data transfers + +Data connections are suitable for transfers that involve large amounts of +static contents such as firmware images. + +To avoid storing multiple copies of the same data on the limited local storage, +for instance in cases where the contents are downloaded from the Internet from a +lower-privilege domain before being handed over to a more isolated +higher-privilege domain, validation of the data such as checksum verification +should be done on the fly by the originator, and only the recipient should store +the data on its local storage. + +Raw direct TCP connections over IPSec or raw UDP sockets can be suitable for +the inter-domain data transfer, as they both provide reliability, integrity and +confidentiality. The downside of this approach is that each application would +need to handle data validation and resumable transfers on its own: for this +reason it is preferable to handle basic data validation in the inter-domain +communication layers and provide the data to the receiver only once it is +complete and matches the specified cryptographic hashes. + +The basic API thus is aimed at senders downloading large contents from the +Internet and directly streaming across the domains without storing them +locally, doing on-the-fly cryptographic validation of the streamed data. +The contents are received and re-validated on the destination domain, where +they are stored in a file which is passed to the destination service once the +transfer is complete and valid. + +When the destination service has received the file handle it must perform any +additional verification of the contents. It can also link the anonymous file +descriptor to a locally-accessible file path using the [`linkat()`][man 2 link] +syscall with the `AT_EMPTY_PATH` flag or use the +[`copy_file_range()`][man 2 copy_file_range] syscall to get a copy of the +contents in the most efficient way that the kernel can provide. + +A different mechanism can be defined where the sender stores the +contents in a private file and passes a file descriptor pointing to it to the +inter-domain communication subsystem. The receiving side then uses the +`copy_file_range()` syscall to get a copy of the data that cannot be altered +by the sender and then validates the data. On filesystems that supports +reflinks, `copy_file_range()` will automatically use them to provide fast +copy-on-write clones of the original file: this would make the operation +nearly-instantaneous regardless of the amount of data, and would avoid doubling +the storage requirements. When reflinks cannot be used, +`copy_file_range()` will do an in-kernel copy, avoiding unnecessary +context-switches over normal user-space copy operations. +Such approach can be used on container-based setups or when a cluster +file system is shared across networked domains. +Not many filesystems can handle reflinks, but Btrfs and the OCFS2 cluster +filesystem support them. + +On systems set up such that reflinks can be used, this solution is much more +efficient than the alternatives, but imposes constraints on the whole system +that may not be acceptable, such as requiring filesystems that support reflinks +(such as Btrfs or OCFS2) on all the domains and ensuring that the appropriate +shared filesystem mounts are available to SDK services. +For this reason, the socket-based approach is recommended in the general case. + +#### Data connections API + +This section defines the draft for a proposed D-Bus API that SDK services could use to request +the creation of data channels separated from the control plane connection. + +The gateway and adapter layers are responsible for the creation and initialization +of those channels, while other services and applications must not be able to +directly create them. + +The gateway and adapter layers use instead file descriptors passing to share +the channel endpoints with the requesting services and applications. + +The API drafted here is meant to only provide a very rough guideline for +those implementing any real data channel API and it's not meant to be +normative: real implementations can diverge from the interfaces described +here and the actual API to be used by SDK services must be documented in a +separate specification. + +--- +/* The interface exported by the adapter/gateway to SDK services to initiate channel creation. */ +interface org.apertis.InterDomain.DataConnection1 { + /* @id: the app-specific unique token used to to identify and authorize the channel + * @destination: the bus name of the service which should be at the other end of the channel + * @type: the kind of data and the protocol to be used for the data exchange. + * Use 'audio-rtp' for multiplexed RTP/RFC5761. + * @metadata_in: a dictionary of extra information that can be used to authorize/validate the transfer + * @metadata_out: the @metadata_in dictionary with additional information + * @fd: the file descriptor for the actual data exchange using the protocol specified by @type */ + method CreateChannel (in s id, + in s destination, + in s type, + in a{sv} metadata_in, + out a{sv} metadata_out, + out h fd) + + /* @id: see org.apertis.InterDomain.DataConnection1.CreateChannel() + * + * If the receiver was not able to validate the channel, the `org.apertis.InterDomain.ChannelError` + * error is raised. */ + method CommitChannel(in s id) + + /* @id: see org.apertis.InterDomain.DataConnection1.CreateChannel() */ + method AbortChannel(in s id) + + /* @refclk: the reference to the IDC shared clock, in the format of defined + * by the `clksrc` production of RFC7273 for the `ts-refclk:` parameter */ + method GetClockReference(out s refclk) +} + +/* The interface to be exported by services that can handle incoming channels. + * Domains that do not use a local dbus-daemon can implement a similar mechanism + * with the native IPC system. */ +interface org.apertis.InterDomain.DataConnectionClient1 { + /* @id: see org.apertis.InterDomain.DataConnection1.CreateChannel() + * @sender: the bus name of the service which initiated the channel creation + * @type, @metadata_in, @metadata_out: see org.apertis.InterDomain.DataConnection1.CreateChannel() + * @proceed: true if the channel should be set up, false if it should be refused */ + method ChannelRequested(in s id, + in s sender, + in s type, + in a{sv} metadata_in, + out a{sv} metadata_out, + out b proceed) + + /* @id: see org.apertis.InterDomain.DataConnection1.CreateChannel() + * @success: whether the connection has been successfully set up and @fd is usable + * @fd: the file descriptor from which to read the incoming data with the + previously agreed protocol + method ChannelCreated(in s id, + in b success, + in h fd) +} + +/* The interface private to gateway/adapter services to cross the domain boundary. */ +interface org.apertis.InterDomain.DataConnectionInternal1 { + /* @id: see org.apertis.InterDomain.DataConnection1.CreateChannel() + * @sender: see org.apertis.InterDomain.DataConnectionClient1.ChannelRequested() + * @destination, @type, @metadata_in, @metadata_out: see org.apertis.InterDomain.DataConnection1.CreateChannel() + * @proceed: see org.apertis.InterDomain.DataConnectionClient1.ChannelRequested() + * @nonce: a one-time value used to authenticate the socket + * @socket_addr: the proto:addr:port string to be used to connect to the remote service + method RequestChannel(in s id, + in s sender, + in s destination, + in s type, + in a{sv} metadata_in, + out a{sv} metadata_out, + out b proceed, + out s nonce, + out s socket_addr) + + /* @id: see org.apertis.InterDomain.DataConnection1.CreateChannel() + * @sender: see org.apertis.InterDomain.DataConnectionClient1.ChannelRequested() + * @destination: see org.apertis.InterDomain.DataConnection1.CreateChannel() + * + * If the receiver was not able to validate the channel, the `org.apertis.InterDomain.ChannelError` + * error is raised. */ + */ + method CommitChannel(in s id, + in s sender, + in s destination) + + /* @id: see org.apertis.InterDomain.DataConnection1.CreateChannel() + * @sender: see org.apertis.InterDomain.DataConnectionClient1.ChannelRequested() + * @destination: see org.apertis.InterDomain.DataConnection1.CreateChannel() + */ + method AbortChannel(in s id, + in s sender, + in s destination) +} +--- + +#### Data channel API flow example for a media player streaming audio + +A possible use-case of the API is a Media Player frontend hosted on the AD with the backend on the CE. +The frontend requests the backend to decode a specific stream using an application specific API +and passing a token with the request. + +--- + AD | CE +media player gateway | adapter media player + frontend | backend + o ------ Play() ------------o------------|------------o----------------------> o + | o <-- CreateChannel() -- o + o <-- RequestChannel() -- o + o <-- ChannelRequested() -- o | + o -- ChannelRequested() --> o | + reply | + o -- RequestChannel() --> o + reply + o <- connect and nonce -- o + o <-- ChannelCreated() ---- o | o -- CreateChannel() --> o + | reply + o <------------------------------- data channel ------------------------------ o +--- + +The Media Player frontend initially calls +the application-specific `Play()` method on its backend, +with the IDC system transparently proxying the request across domains. +This call must also carry an application-specific token +that will be used to identify the request during the channel creation procedure. + +Once the Media Player backend has gathered some metadata about the stream to be played, +it requests the creation of an `audio-rtp` channel directed to the Media Player frontend +by calling the `org.apertis.InterDomain.DataConnection1.CreateChannel()` +on the local adapter service. + +The adapter service will then access the inter-domain link +by calling the `org.apertis.InterDomain.DataConnectionInternal1.RequestChannel()` method +of the remote gateway peer. + +The gateway service on the AD notifies the Media Player frontend +that a channel has been requested, +passing the request token and other application-specific metadata. +If the token matches and the metadata is acceptable, +the Media Player frontend replies to the gateway service telling it to proceed. + +Once the request has been accepted by the destination, +the gateway service creates a listening socket for the requested channel type +and returns the information needed to connect to it to the remote adapter peer, +including a nonce to authenticate the connection. + +As soon as the adapter gets the socket information +it connects to it and sends the nonce over it. +On the other side the gateway will read the nonce +and if does not matches it immediately closes the connection. + +Once the connection has been set up and the nonce has been successfully shared, +the adapter and gateway services +will hand over the file descriptors of the sockets that have been set up. + +#### Data channel API flow example for an update manager sharing firmware images + +The bulk data transfer API is meant to be useful for update managers where an +agent in the Connectivity Domain fetches firmware images from the Internet and +shares them with the update manager in the AD which has access to the devices +to be updated. + + +--- + AD | CD +update manager gateway | adapter OTA agent + o ----> GetUpdate() -------o------------|------------o----------------------> o + | o <-- CreateChannel() -- o + o <-- RequestChannel() -- o + o <-- ChannelRequested() -- o | + o -- ChannelRequested() --> o | + reply | + o -- RequestChannel() --> o + reply + o <- connect and nonce -- o + | o -- CreateChannel() --> o + | reply + o <-----data channel ----------------------------- o + | o <- CommitChannel() --- o + o <-- CommitChannel() --- o + o <-- ChannelCreated() ---- o | o -- CommitChannel() --> o + | reply + +--- + +The update manager calls the `GetUpdate()` method of the agent, with a token +identifying the request. The OTA agent retrieves the metadata of the file to +be shared, in particular the size and a set of cryptographic hashes. With that +information, it requests the creation of a `bulk-data` channel with the +`org.apertis.InterDomain.DataConnection1.CreateChannel()` method of the local +adapter service. +The OTA agent must specify the `size` parameter and a known cryptographic hash +such as `sha512` in the `metadata_in` parameter. It must then check in the +`metadata_out` for the `offset` parameter to figure out if it must resume an +interrupted download. + +The adapter service accesses the inter-domain link by calling the +`org.apertis.InterDomain.DataConnectionInternal1.RequestChannel()` method of +the remote gateway peer. + +The flow is analogous to the one in the +[streaming media player case][Data channel API flow example for a media player streaming audio] +until the point where the inter-domain socket is created: while the receiving +end of the socket in the streaming case is meant to be used by the receiving +service, in the bulk data case it is used directly by the gateway, which stores +the received data in a local file. + +While it sends data through the socket, the OTA agent is expected to perform +on-the-fly data validation by computing cryptographic hashes on the streamed +contents: once it has sent all the data the agent can close the socket and call +`org.apertis.InterDomain.DataConnectionInternal1.CommitChannel()` to signal +that all the data has been shared successfully and that the computed hashes +match, or `AbortChannel()` otherwise. + +Upon receiving the `CommitChannel()` message, the gateway checks that the file +size and cryptographic hashes match the expected values and raises the +`ChannelError` error otherwise. If and only if the data is valid it instead +shares the file descriptor pointing to the file to the OTA updater with a +`ChannelCreated()` call. + +### Traffic control + +[Traffic control] should be set by the inter-domain service ([][Protocol library and inter-domain services]) +in the CE domain, using the standard Linux traffic control +functionality in the [kernel][linux-traffic-control]. As the control connection and each +data connection are separate TCP or UDP connections, they can have +traffic controls applied to them individually, which allows different +quality of service settings for individual data connections; and allows +the control connection to have a higher quality of service than all data +connections, to help ensure it has guaranteed low latency. + +Applying traffic control in the CE domain has the advantage of knowing +what kernel functionality is available — if it were applied in the +automotive domain, its functionality would be limited by whatever is +provided by the automotive OS (for example, QNX). It has the +disadvantage, however, of being vulnerable to the CE domain being +compromised: if an attacker gains control of the inter-domain service in +the CE domain, they can disable traffic control. However, if they have +gained control of that service, the only remaining mitigation is for the +automotive domain to shut down the CE domain, so having control over +traffic policy has little effect. + +The specific traffic control policies used by the inter-domain service +can be determined later, based on the relative priorities an OEM assigns +to different types of traffic. + +### Protocol library and inter-domain services + +The inter-domain communications protocol should be implemented as a +library, containing all layers of the protocol. The particular domain +configuration which the library targets should be a configure-time +option, though the library must support enabling the [][Standalone setup] +transport in conjunction with another transport, when in developer mode +(see [][Mock SDK implementation]). + +By implementing the protocol as a library, it can be tested easily by +being linked into unit tests — rather than trying to wrap the entire +inter-domain service daemon in a test harness. Internally, the library +should implement all protocol layers separately and expose them to the +unit tests so that they can be tested individually. + +Furthermore, this allows the protocol code to be reused between the +inter-domain service in the automotive domain, and the inter-domain +service in the CE domain. + +The main advantage of implementing the protocol as a library is the +flexibility this provides for integrating it into different automotive +domain implementations — it can be integrated into an existing system +service (bearing in mind the suggestion to keep it in a separate trust +domain, [][Security domains]), or could be used as a stand-alone service daemon. + +A reference implementation of such a stand-alone inter-domain service +program should be provided with the protocol library. This should +provide the necessary systemd service file and AppArmor profile to allow +itself to be strictly confined if the automotive domain OS supports +this. + +As the inter-domain communications protocol uses D-Bus, the protocol +library must contain an implementation of the D-Bus protocol. Note that +this is *not* a D-Bus daemon; it is a D-Bus library, like libdbus or +GDBus. See [][Appendix: D-Bus components and licensing] +for details about the different components in +D-Bus and their licensing. + +Apart from its D-Bus library dependency, the protocol library should be +designed with minimal dependencies in order to be easily integratable +into a variety of automotive domain operating systems (from Linux +through to other Unixes, QNX or Autosar). If the chosen D-Bus library is +available as part of the automotive OS (which is more likely for libdbus +than for other D-Bus libraries), it could be linked against; otherwise, +it could be statically linked into the protocol library. + +libdbus itself is already quite portable, having been known to work on +Linux, Windows, OS X, NetBSD and QNX. It should not be difficult to port +to other POSIX-compliant operating systems. + +[][Rate limiting on control messages] should be +implemented in the protocol library, so that the same functionality is +present in both the automotive and CE domains. + +The protocol library should expose the encryption keys for the IPsec +connection used in the inter-domain communications link, including +signals for when those keys change (due to cookie renegotiation on the +link). The keys must only be exposed in development builds of the +protocol library. See [][Debuggability] for more details. + +### Non Linux-based domains + +The suggested implementation uses D-Bus the *protocol*, not necessarily +dbus-daemon the *message bus daemon* or libdbus the *protocol library*. + +This means that for inter-domain communications purposes, only the +serialization format of D-Bus is used as a well defined RPC protocol. +There's no requirement that domains run `dbus-daemon` or that they use +a specific D-Bus implementation to talk to other domains. + +Several implementations of the D-Bus serialization format exists and +their use is trongly encouraged rather than reimplementing the protocol +from scratch: + + - [GDBus] is a GTK+/GNOME oriented implementation of the D-Bus protocol in GLib + + - [QtDBus] is Qt module that implements the D-Bus protocol + + - [node-dbus] is a D-Bus protocol implementation for NodeJS written in pure JavaScript + + - [libdbus] is the reference implementation of the D-Bus protocol + + - [dbus-sharp] is a C#/.net/Mono implementation of the D-Bus protocol + + - [pydbus] is a python implementation of the D-Bus protocol + +On networked setups the D-Bus-based protocol is trasported over TCP, relying on +IPSec for authentication, confidentialitity and reliability. + +If IPSec nor TLS are available, those properties cannot be guaranteed, and thus +such setup is strongly discouraged. In that case every input should be treated +as potentially malicious: the trusted domains must export only a very reduced +set of interfaces, which must be conceived in a way that any kind of misuse +does not lead to harm. + +### Service discovery + +Accordingly to the use of the D-Bus serialization protocol, each service +exported over the inter-domain communication channels is identified +by a well-known name subject [specific constraints](https://dbus.freedesktop.org/doc/dbus-specification.html#message-protocol-names-bus), +starting with the reversed DNS domain name of the author of the service +(for instance, `com.collabora.CarOS.ClimateControl1` for a potential +service written by [Collabora](https://collabora.com). + +Only one service at a time can own such names on each domain, but the +ownership is not tracked across domains and collision may happen due +to a transitional state during an upgrade or other causes: each setup +is thus responsible to define a deterministic collision resolution +procedure should two domains export the same service name. + +The adapter layer is responsible to inspect on which channel each +service is available. +The [`NameOwnerChanged signal`](https://dbus.freedesktop.org/doc/dbus-specification.html#bus-messages-name-owner-changed) +must be used by the adapter layer to track the availability of services +on each connection and to detect when a service is no longer available +or changed ownership (for example because it has been restarted). The +[`org.freedesktop.DBus.ListActivatableNames()`](https://dbus.freedesktop.org/doc/dbus-specification.html#bus-messages-list-activatable-names) +message can be used to gather the initial list of available services. + +After an upgrade a domain may stop providing a specific service and +another domain may start providing it instead: both the old and new +domains must trigger the +[`NameOwnerChanged signal`](https://dbus.freedesktop.org/doc/dbus-specification.html#bus-messages-name-owner-changed) +in response to the +[`org.freedesktop.DBus.ReleaseName()`](https://dbus.freedesktop.org/doc/dbus-specification.html#bus-messages-release-name) +--- +[`org.freedesktop.DBus.RequestName()`](https://dbus.freedesktop.org/doc/dbus-specification.html#bus-messages-request-name) +calls. No specific ordering is required and thus the service may be +temporarily unavailable or the two domains may export the same service +name at the same time: the collision resolution procedure must choose +the one on the connection with the highest priority. + +In the simplest case, each domain must be given an unique priority with +the AD having the highest priority. The relative priority between the +CE domains is used to provide deterministic service access when a service +name exists on multiple connections. As a result, the priority list +must be static and the priority of CE domains can be assigned arbitrarily +for each specific setup. + +When accessing a service name that exists on more than one connection, +the service that exists on the connection with the highest priority must +be given precedence by the adapter layer. + +CE domains should not be able to spoof trusted services exported by the +AD: for this reason a static list of services meant to be exported only +by the AD must be defined and the adapter layer must ignore matching +services exported by other connections, even if the service is not +currently available on the AD connection itself. + +Particular care must be taken to ensure each domain can be fully booted +without blocking on services hosted on other domains, to avoid untracked +circular dependencies. + +SDK services must access the above service names through the private bus +instance exported by the adapter layer, which proxies them from all the +inter-domain channels, abstracting the complexities of inter-domain +communications. SDK services are not aware of the fact that the services +are hosted on different domains. + +### Automotive domain export layer + +To integrate the inter-domain communications system into an automotive +domain operating system, the APIs to be shared must be exported as +objects on the D-Bus connection provided by the inter-domain service. +This is done as an *export layer* in the inter-domain service in the +automotive domain, customised for the OEM and their specific APIs. The +export layer could be implemented as pure C calls from within the same +process (no protocol at all), or D-Bus, or kdbus, or QNX message +passing, or something else entirely. If D-Bus bus is used, a D-Bus +daemon would need to be running on the automotive domain; otherwise, no +D-Bus daemon would be needed. + +For example, if the automotive domain provides the APIs which are to be +exposed over the inter-domain connection as: + + - C APIs in headers — the inter-domain service would call those APIs + directly, and the export layer would essentially be those C calls; + + - daemons with UNIX socket connections — the inter-domain service + would connect to those sockets and run whatever protocol is + specified by the daemons, and the export layer would essentially be + the socket connections and protocol implementations; + + - D-Bus services — the inter-domain service would connect to a D-Bus + daemon on the automotive domain and translate the services’ D-Bus + APIs into an API to expose on the inter-domain communications link + (see below), and the export layer would be the D-Bus daemon, D-Bus + library in the inter-domain service, and the code to translate + between the two D-Bus APIs. + +The APIs must be exported under [well-known names] formatted as +reverse-DNS names owned by the OEM. For example, if the AD operating +system was written by Collabora, APIs would be exported using well-known +names starting with com.collabora, such as +com.collabora.CarOS.EngineManagement1 or +com.collabora.CarOS.ClimateControl1. + +The API formed by these exported D-Bus objects is vendor-specific, but +should maintain its own stability guarantees — for every +backwards-incompatible change to this API, there must be a corresponding +update to the CE domain to handle it. Consequently, we recommend +[versioning the exported D-Bus APIs][dbus-api-versioning]. + +APIs which the OEM does not want to make available on the inter-domain +communications link (for example, because they are not able to handle +untrusted data, or are too powerful to expose) must not be exported onto +the D-Bus connection. This effectively forms a whitelist of exposed +services. + +For each piece of functionality exposed by the AD, suitable safety +limits must be applied ([][Safety limits on AD APIs]). If the implementation of that +functionality already applies the safety limits, nothing more needs to +be done. Otherwise, the safety limits must be enforced in the interface +code which exports that functionality onto the inter-domain D-Bus +connection. + +Similarly, for each piece of functionality exposed by the AD, if it +fails to respond to a call by the inter-domain service, the service must +return an error to the CE over the inter-domain D-Bus connection, rather +than timing out. This is especially important in systems where the +export layer is a set of C calls — the implementation must take care to +ensure those calls cannot block the inter-domain service. + +If the vendor wants to implement per-API kill switches for services +exported by the automotive domain, these must be implemented in the +export layer (see [][Disabling the CE domain]). + +### Consumer-electronics domain adapter layer + +Paired with the OEM-specific API export code in the automotive domain is +an *adapter layer* in the CE domain. This adapts the API exported by the +services on the automotive domain to the stable SDK APIs used by +applications in the CE domain. The layer has an implementation in each +of the SDK services in the CE domain. + +This adapter layer does not have a trust boundary — each part of it lies +within the trust domain of the relevant SDK service. + +These adapters connect to a private D-Bus bus, which the inter-domain +service in the CE domain is also connected to. The inter-domain service +exports the OEM APIs from the automotive domain on this bus, and the +adapters consume them. + +The private bus could be implemented either by running dbus-daemon with +a custom bus configuration, or by implementing it directly in the +inter-domain service, and having all adapters connect directly to the +service. In both cases, the trust boundary between the adapters (within +the trust domains of the SDK services) and the inter-domain service are +enforced. + +### Interaction of the export and adapter layers + +The interaction between the export and adapter layers is important in +maintaining compatibility between different versions of the AD and CE as +they are upgraded separately. The CE is typically upgraded much more +frequently than the AD. Both are customised to the OEM. + +#### Initial deployment + +The OEM develops both layers, and stabilises an initial version of their +inter-domain API, using a version number (for example, 1). The export +layer exports objects from the automotive domain, and the adapter layer +imports those same objects. There may be functionality exposed on the +objects which the SDK APIs currently do not support — in which case, the +adapter layer ignores that functionality. + +#### CE is upgraded, AD remains unchanged + +A new release of Apertis is made, which expands the SDK APIs to support +more functionality. The OEM integrates this release of Apertis and +updates their adapter layer to tie the new SDK APIs to previously-unused +objects from the inter-domain link. + +The version number of the inter-domain API remains at 1. + +#### AD is upgraded, CE remains unchanged + +The automotive domain OS is upgraded, and more vehicle functionality +becomes available to expose on the inter-domain connection. The OEM +chooses to expose most of this functionality using the inter-domain +service. For some objects, this results in no API changes. For other +objects, it results in new methods being added, but no old ones are +changed. For some objects, it results in some old methods being removed +or their semantics changed. For these objects, the OEM now exports *two* +interfaces on the inter-domain service: one at version 1, exporting the +old API; and one at version 2, exporting the new API. The version number +of other inter-domain APIs remains at 1. + +The CE domain software remains unchanged, which means it continues to +use the version 1 APIs. This continues to work because all objects on +the inter-domain API continue to export version 1 APIs (in addition to +some version 2 APIs). + +#### CE is upgraded again + +The next time the CE domain is upgraded, its adapter layer can be +modified by the OEM to use the new version 2 APIs for some of the +services. If this updated version of the CE domain is guaranteed to only +be used with new versions of the AD, the adapter layer can drop support +for version 1 APIs. If the updated CE domain may be used with old +versions of the AD, it must support version 1 and version 2 (or just +version 1) APIs, and use whichever it prefers. + +### Flow for a given SDK API call + +In the following figure, particular attention should be paid to the restrictions on +the protocols in use for each link. For the links between the +application and the inter-domain service in the CE domain, any version +of the D-Bus protocol can be used, including kdbus or another future +version. This depends only on the dbus-daemon and D-Bus libraries +available in the CE domain. For the link between the two inter-domain +services, the protocol must always be at least D-Bus 1.0 over TCP over +IPsec. If both peers support a later version of the protocol, they may +use it — but both must always support D-Bus 1.0 over TCP over IPsec. For +the link between the inter-domain service in the automotive domain and +the OEM service, whatever protocol the OEM finds most appropriate for +implementing their export layer should be used. This could be pure C +calls from within the same process (no protocol at all), or D-Bus, or +kdbus, or QNX message passing, or something else +entirely. + + + +> Apertis IDC message flow, following a message being sent from application +> to hardware; the message flow is the same in reverse for message replies +> from the hardware + +### Trusted path to the AD + +Providing a trusted input and output path between the user and the +automotive domain is out of scope for this design — it is a problem to +be solved by the graphics sharing and input handling designs. However, +it is worth noting that the solution must not involve communication +(unauthenticated, or authenticated via the CE domain) over the +inter-domain link. If it did, a compromised CE domain could be used to +forge this communication and gain control of the trusted path to the AD +— which likely results in a large privilege escalation. + +A trusted path should be implemented by direct communication between the +input and output devices and the automotive domain, or mediating such +communication through the hypervisor, which is trusted. + +### Developer mode + +In order to support connecting the CE domain from an SDK on a +developer’s laptop to the automotive domain in a development vehicle, +the ‘separate boards setup with other devices’ configuration must be +used, with the CE domain and the automotive domain connected to the +developer’s network (which might have other devices on it). + +In order to allow the SDK to connect, the vehicle must be in a +‘developer mode’. This is because the CE domain is entirely untrusted +when it is provided by the SDK, because the developer may choose to +disable security features in it (indeed, they may be working on those +security features). + +**Open question**: What cryptography should be used to implement this +authentication, and the division of trust between development and +production devices? A likely solution is to only have the AD accept the +CE connection if it connects with a ‘production’ key signed by the +vehicle OEM. + +### Mock SDK implementation + +In order to allow applications to be developed against the Apertis SDK, +implementations of all the SDK APIs need to be provided as part of the +official SDK virtual machine distribution. These implementations need to +be fully featured, otherwise application developers cannot develop +against the unimplemented features. + +There are two implementation options: + +1. Have an Apertis SDK adapter layer which provides the mock + implementations, and which does not use an inter-domain service or + mock up any of the automotive domain. + +2. Write the mock implementations as stand-alone services which are + logically part of the automotive domain (even though there is no + domain separation in the SDK). Expose these services on the + inter-domain link using an Apertis SDK export layer; and adapt the + services to the actual SDK APIs using an Apertis SDK adapter + layer. + The inter-domain services would be running in the same domain (the + SDK) and would communicate over a loopback TCP socket (see + [][Standalone setup]). + +Option \#1 has a much simpler implementation, but option \#2 means that +the inter-domain communications code paths are tested by all application +developers. Similarly, option \#1 introduces the possibility for +behavioural differences between the mock adapter layer and the +production inter-domain communication system, which could affect how +application developers write their applications; option \#2 reduces the +potential for that considerably. + +As option \#2 uses the inter-domain service in the CE domain, it also +allows for the possibility of connecting the CE domain to a different +automotive domain — rather than the mock one provided by the SDK, a +developer could connect to the automotive domain in a development +vehicle ([][Developer mode]). + +Hence, our recommendation is for option \#2. + +### Debuggability + +The debuggability of the inter-domain communications link is important +for many reasons, from integrating two domains to bringing up a new +automotive domain (with its export and adapter layers) to developing a +new SDK API. + +Referring to the figure in [][Overall architecture], debugging of: + + - *applications and the SDK services* happens using normal tools and + methods described in the [Debug and Logging design]; + + - *communications between the dbus-daemon (private bus) and + inter-domain service (CE domain)* happens using normal D-Bus + monitoring tools (such as [Bustle] or [dbus-monitor]), + though this requires the developer to gain access to the private + bus’ socket; + + - *communications between the inter-domain services* happens using a + special debug option in the services (see below); + + - *the export layer and OEM services* happens using tools and methods + specific to how the OEM has implemented the export layer. + +If possible, all debugging should happen on the SDK side, in the adapter +layer or above, as this allows the greatest flexibility in debugging +techniques — none of the communications at that level are encrypted, so +are accessible to a developer user with the appropriate elevated +permissions. + +If the connection between the inter-domain services (the TCP/IPsec link +between domains) needs to be debugged, it can be complex, as any +debugging tool needs to be able to decrypt the IPsec encryption. +Wireshark is [able to do this][ws-decrypt], if given the encryption key in use by the +IPsec connection. This key may change over the lifetime of a +connection (as the connection cookie is refreshed), and hence needs to +be exported dynamically by the inter-domain service. In order to allow +debugging both ends of the connection, it should be implemented in the +protocol library ([][Protocol library and inter-domain services]). In the CE domain, it should be exposed +as a D-Bus interface on the private bus which is part of the adapter +layer. This limits its access to developers who have access to that bus. + +--- +Interface org.apertis.InterDomainConnection.Debug1 { + /* Mapping from IKEv1 initiator cookie to encryption key. */ + readonly property a{ss} Ike1Keys; + /* Mapping from IKEv2 tuple of (initiator SPI, responder SPI) to tuple + * of (SK_ei, SK_er, encryption algorithm, SK_ai, SK_ar, integrity + * algorithm). Algorithms are enumerated types, with values to be + * documented by the implementation. Other parameters are provided as + * hexadecimal strings to allow for varying key lengths. */ + readonly property a((ss)(ssssussu)) Ike2Keys; +} +--- + +A [new Lua plugin][ws-decrypt-plugin] in Wireshark could connect to this interface and +listen for signals of updates to the connection’s keys, and use those to +update Wireshark’s IKE decryption table. Wireshark is the suggested +debugging tool to use, as it is a mature network analysis tool which is +well suited to analysing the protocols being sent over the inter-domain +connection. + +In the automotive domain, the key information provided by the protocol +library should be exposed in a manner which best fits the debugging +infrastructure and tools available for the automotive operating system. + +In both domains, this interface must only be exposed in developer builds +of the inter-domain services. It must not be available in production, +even to a user with elevated privileges. To expose it would allow all +inter-domain communications to be decrypted. + +### External watchdog + +There must be an external watchdog system which watches both the +automotive and consumer–electronics domains, and which restarts either +of them if they crash and fail to restart themselves. + +In order to prevent one compromised domain from preventing a restart of +the other domain (a denial of service attack), each domain must only be +able to send heartbeats to its own watchdog, and not the watchdog of the +other domain. + +The implementation of the watchdog depends on the configuration: + + - Standalone setup: No watchdog is necessary, as the configuration is + not safety critical. + + - Basic virtualised setup: The watchdog should be a software component + in the hypervisor, exposed as virtualised watchdog hardware in the + guests. + + - Separate CPUs setup: A hardware watchdog on the board should be + used, connected to both domains. As an exception to the general + principle that the CE domain should not be allowed to access + hardware, it must be able to access its own watchdog (and must not + be able to access the automotive domain’s watchdog). + + - Separate boards setup: A hardware watchdog on each board should be + used, connected to the domain on that board. + + - Separate boards setup with other devices: Same as the separate + boards setup. + + - Multiple CE domains setup: Same as the separate boards setup. + +### Tamper evidence and hardware encryption + +The basic design for providing a root of confidentiality and integrity +for the system in hardware should be provided by the Secure Boot +design — this design can only assume that some confidential +encryption key is provided which is used to decrypt parts of the system +on boot which should remain confidential. + +> As of February 2016 the Secure Boot design is still forthcoming + +One possibility for implementing this is for a confidential key store to +be provided by the automotive domain, storing keys which encrypt the +bootloader and root key store for the CE. When booting the CE, the AD +would decrypt its bootloader and hence its root key store, making the +keys necessary for inter-domain communications (amongst others) +available in the CE’s memory. Note that this suggestion should be +ignored if it conflicts with recommendations in the Secure Boot design, +once that’s published. + +A critical requirement of the system is that none of the keys for +encrypting inter-domain communications (or for protecting those keys) +can be shared between vehicles — they must be unique per vehicle +([][No global keys in vehicles]). This implies that keys must be generated and +embedded into each vehicle as a stage in the imaging process for the +domains. + +A corollary to this is that none of those keys can be stored by the +vendor, trusted dealer or other global organisations associated with the +vehicles; as to do so would provide a single point of failure which, if +compromised by an attacker, could reveal the keys for all vehicles and +hence potentially allow them all to be compromised easily. + +Tamper evidence is an important requirement for the system ([][Tamper evidence]), +providing the ability to determine if a vehicle has been tampered +with in case of an accident or liability claim. + +The most appropriate way to provide tamper evidence for the hardware +depends on the hardware and how it is packaged in the vehicle. Typical +approaches to tamper evidence involve sealing the domain’s circuitry, +including all access and I/O ports, in a casing which is sealed with +tamper evident [seals][security-seal]. If a garage or trusted vehicle dealer needs +to access the domain for maintenance or updates, they must break the +seals, enter this in the vehicle’s maintenance log, and replace the +seals with new ones once maintenance is complete. + +Tamper evidence for software should be provided through the integrity +properties of the Secure Boot design, as in any [trusted platform module] +system. + +### Disabling the CE domain + +The automotive domain must be able to disable the power supply to the CE +domain (or otherwise prevent it from booting), and must be able to +prevent inter-domain communications at the same time. + +Preventing inter-domain communications should be implemented by having +the automotive domain inter-domain service read a ‘kill switch’ setting. +If this is set, it should close any open inter-domain communication +links, and refuse to accept new ones while the setting is still set. + +Preventing the CE domain from booting can be done in a variety of ways, +depending on the hardware functionality available. For example, it could +be done by controlling a solid-state relay on the CE domain’s power +supply. Or, if the CE domain implements secure boot, the boot process +could require the automotive domain to decrypt part of the CE domain +bootloader using a key known only to the automotive domain — if the kill +switch is set, this key would be unavailable. + +**Open question**: What hardware provisions are available for +controlling the power supply or boot process of the CE domain? How +should this integrate with the secure boot design? + +The kill switch is intentionally kept simple, controlling whether *all* +inter-domain communications are enabled or disabled, and providing no +finer granularity. This is intended to make it completely robust — if +support were added for selectively killing some of the control APIs or +data connections on the inter-domain communications link, but not +others, there would be much greater scope for bugs in the kill switch +which could be exploited to circumvent it. + +If the OEM wants to provide finer grained kill switches for different +APIs in the automotive domain, they must implement them as part of those +services, or as part of the export layer which connects those services +to the inter-domain service. + +### Reporting malicious applications + +There are three options for reporting malicious behaviour of +applications to the Apertis store: + +1. Report from the inter-domain service in the automotive domain, based + on error responses from the OEM APIs. + +2. Report from the inter-domain service in the CE domain, based on + error responses from the automotive domain. + +3. Report from the SDK API adapter layers, based on error responses + from the automotive domain. + +They are presented in decreasing order of reliability, and increasing +order of helpfulness. + +Option \#1 is reliable (an attacker can only prevent a detected +malicious action from being reported by compromising the automotive +domain), but not helpful (the automotive domain does not have contextual +information about the access, such as the application bundle which +originally made the request — bundle identifiers cannot be sent across +the inter-domain link as that would mean partially defining the OEM +APIs). This option has the additional disadvantage that it requires the +AD to communicate directly with the Apertis store without going via the +CE, which likely means the AD is on the Internet and could potentially +be compromised by a Heartbleed-style vulnerability in a communication +path that was intended to be secure. Options \#2 and \#3 do not have +this disadvantage, because in those options it is the CE that needs to +communicate on the Internet. + +Option \#3 is unreliable (an attacker can prevent a detected malicious +action from being reported by compromising that SDK service in the CE +domain), but most helpful (the CE domain knows all contextual +information about the access, including the application bundle +identifier, parameters sent to the SDK API by the application, and the +output of the adapter layer which was sent to the inter-domain link). + +We recommend option \#3 as it is the most helpful, and we believe that +the additional contextual information it provides outweighs the +potential loss of reports from most severely compromised vehicles. This +is one part of many which contribute to the security of the system. + +An alternative would be to implement two or all of the options, and +leave it up to the Apertis store software to combine or deduplicate the +reports. + +### Suggested roadmap + +One the design has been reviewed, it can be compared to the existing +state of the inter-domain communication system, and a roadmap produced +for how to reconcile the differences (if there are any). + +**Open question**: How does this design compare to the existing state of +the inter-domain communication system? + +### Requirements + +**Open question**: Once the design is finalised a little more, it can be +related back to the requirements to ensure they are all satisfied. + +## Open questions + + - [][Existing inter-domain communication systems]: Are there any relevant existing systems to compare against? + + - [][IPSec versus TLS]: What is the security of the IPsec protocol in its current + (2015) state? + + - [][IPSec versus TLS]: What is the performance of TCP and UDP over IPsec, TLS over + TCP and DTLS over UDP on the Apertis reference hardware? + + - [][Configuration designs]: What trade-off do we want between performance and testability + for the different transport layer configurations? + + - [][Configuration designs]: What more detailed configuration options can we specify for + setting up IPsec? For example, disabling various optional features + which are not needed, to reduce the attack surface. What IKE service + should be used? + + - [][Configuration designs]: A lot of business logic for control over OEM licencing can be + implemented by the choice of the CA hierarchy used by the + inter-domain communication system. What business logic should be + possible to implement? + + - [][Configuration designs]: Consider key control, revocation, protocol obsolescence, and + various extensions for pinning keys and protocols. + + - [][Configuration designs]: What can be done in the automotive domain to reduce the + possibility of exploits like Heartbleed affecting the inter-domain + communications link? This is a trade-off between the stability of AD + updates (high; rarely released) and the pace of IPsec and TLS + security research and updates and the need for crypto-agility + (fast). Heartbleed was a bug in a bad implementation of an optional + and not-very-useful TLS extension. + + - [][Control protocol]: How should the multiple CE configuration (section 8.3.2) + interact with D-Bus signals? Can the adapter layer perform the + broadcast to all subscribers? + + - [][Developer mode]: What cryptography should be used to implement this + authentication, and the division of trust between development and + production devices? A likely solution is to only have the AD accept + the CE connection if it connects with a ‘production’ key signed by + the vehicle OEM. + + - [][Disabling the CE domain]: What hardware provisions are available for controlling the + power supply or boot process of the CE domain? How should this + integrate with the secure boot design? + + - [][Suggested roadmap]: How does this design compare to the existing state of the + inter-domain communication system? + + - [][Requirements]: Once the design is finalised a little more, it can be related + back to the requirements to ensure they are all satisfied. + +## Summary of recommendations + +**Open question**: Once the design is finalised a little more, and a +suggested roadmap has been produced ([][Suggested roadmap]), it can be summarised +here. + +## Appendix: D-Bus components and licensing + +The terminology around D-Bus can sometimes be confusing; here are some +details of its components and their licensing. + + - *D-Bus* is a [protocol][dbus-spec] which defines an on-the-wire format for + marshalling and passing messages between peers, a type system for + structuring those messages, an authentication protocol for + connecting peers, a set of transports for sending messages over + different underlying connection media, and a series of high-level + APIs for implementing common API design patterns such as properties + and object enumeration. + It has a reference implementation (libdbus and dbus-daemon), but + these are by no means the only implementations. + The protocol has had full backwards compatibility since [2006][dbus-stability]. + + - A *D-Bus daemon* (for example: dbus-daemon, kdbus) is a process + which arbitrates communication between D-Bus peers, implementing + multicast communications (such as signals) without requiring all + peers to connect to each other. + Different D-Bus daemons have different performance characteristics + and licensing. For example, kdbus runs in the kernel to improve + performance by reducing context switching overhead, at the cost of + some features; dbus-daemon runs in user space with more overhead, + but is still quite performant. + + - A *D-Bus library* (for example: libdbus, GDBus) is a set of code + which implements the D-Bus protocol for one peer, converting + high-level D-Bus API calls into on-the-wire messages to send to + another peer or a D-Bus daemon to send to other peers. + Different D-Bus libraries have different performance characteristics + and licensing. + +### Licensing + + - The D-Bus Specification is freely licensed and has no restrictions + on who may implement it or how those implementations are licensed. + + - libdbus and dbus-daemon are both licensed under your choice of the + [AFLv2.1], or the [GPLv2] (or later versions). + + - Hence, if the AFL license is chosen, libdbus and dbus-daemon may + be used in non-open-source products. + + - GDBus is part of GLib, and hence is licensed under the + [LGPLv2.0] (or later versions). + +## Appendix: D-Bus performance + +libdbus and dbus-daemon are reasonably performant, having been used in +various low-resource products (such as mobile phones) over the years. +There have not been any quantitative evaluations of their performance in +terms of latency or memory usage recently, but some have been done [in][will-dbus-perf] +[the][dbus-signal-perf] [past][ipc-perf]. + +As indicative numbers *only*, D-Bus (using [dbus-python] and +dbus-daemon, not kdbus) gives performance of roughly: + + - 20,000 messages per second throughput + + - 130MB per second bandwidth + + - 0.1s end-to-end latency between peers for a given message + + - This is likely an overestimate, as ping-pong tests written in C + have given latency of 200µs + + - 2.5MB memory footprint (RSS) for dbus-daemon in a desktop + configuration + + - So this could likely be reduced if needed — the amount of + message buffering dbus-daemon provides is configurable + +Note that these numbers are from performance evaluations on various +versions of dbus-daemon, so should be considered indicative of an order +of magnitude only. As with all performance measurements, accurate values +can only be measured on the target system in the target configuration. + +The most commonly accepted disadvantage of using D-Bus with dbus-daemon +is the end-to-end latency needed to send a message from one peer, +through the kernel, to the dbus-daemon, then through the kernel again, +to the receiving peer. This can be reduced by using kdbus, which halves +the number of context switches needed by implementing the D-Bus daemon +in [kernel space][kdbus]. However, kdbus has not yet been accepted into the +upstream kernel, and (as of February 2016) there is some concern that +this might not happen due to kernel politics. It can be integrated into +distributions as a kernel module, although it relies on a few features +only available in kernel version 4.0 or newer. This means it should be +straightforward to integrate in the CE, but potentially not in the AD +(and certainly not if the AD doesn’t run Linux — in such cases, +dbus-daemon can be used). + +Overall, the performance of a D-Bus API depends strongly on the API +design. Good [D-Bus API design] eliminates redundant round trips +(which have a high latency cost), and offloads high-bandwidth or latency +sensitive data transfer into side channels such as UNIX pipes, whose +identifiers are sent in the D-Bus API calls [as FD handles][dbus-fd-handles]. + +## Appendix: Software versus hardware encryption + +The choice about whether to use software or hardware encryption is a +tradeoff between the advantages and disadvantages of the options. There +are actually several ways of providing ‘hardware encryption’, which +should be considered separately. In order from simplest to most complex: + + - **Encryption acceleration instructions** in the processor, such as + the [AES instruction set], [CLMUL] or the [ARM cryptography extensions]. + These are available in most processors now, and + provide assembly instructions for performing expensive operations + specific to certain encryption standards, typically AES, SHA and + Galois/Counter Mode (GCM) for block ciphers. Intel architectures + have the most extensions, but ARM architectures also have some. + + - [**Secure cryptoprocessor**][secure-cryptoprocessor]. These are separate, hardened + hardware devices which implement all encryption operations and some + key storage and handling within a tamper-proof chip. They are + conceptually similar to hardware video decoders — the CPU hands off + encryption operations to the coprocessor to happen in the + background. They typically do not have their own memory. + + - [**Hardware security module**][hw-secu-module] (HSM). These are even more + hardened secure cryptoprocessors, which typically come with their + own tamper-proof memory and supporting circuitry, including + tamper-proof power supply. They handle all aspects of encryption, + including all key storage and management (such that keys never leave + the HSM). + +### Software encryption (without encryption acceleration instructions) + + - Lowest encryption bandwidth. + + - Highest attack surface area, as keys and in-progress encryption + values have to be stored in system memory, which can be read by an + attacker with physical access to the hardware. + + - Certain versions of some cryptographic libraries are + [FIPS]-certified, but not all. GnuTLS has been FIPS certified in + various devices, but is not [routinely certified][gnutls-cert]. OpenSSL is + not routinely certified, but provides a OpenSSL FIPS Object + Module which *is* [certified][openssl-cert] as a drop-in replacement for + OpenSSL, provided that it’s used unmodified. The Linux kernel’s + IPsec support has been certified in Red Hat Enterprise Linux 6, but + is not [routinely certified][rhel-cert]. + + - Cheaper than hardware. + + - Provides the possibility of upgrading to use different encryption + algorithms in future. + + - Possible to check the software implementation for backdoors, + although it’s a lot of work. Some of this work is being done by + [other users of open source encryption software][ncc]. + +### Software encryption (with encryption acceleration instructions) + + - Same advantages and disadvantages as software encryption without + encryption acceleration instructions, except that the use of + acceleration gives a higher encryption bandwidth (on the order of a + factor of 10 improvement). + + - Same software interface as without acceleration. + + - Both TLS and IPsec provide various cipher suite options, at least + some of which would benefit from hardware acceleration — both use + [AES-GCM] for data encryption, which benefits from AES + instructions. + +### Secure cryptoprocessor + + - Higher encryption bandwidth. + + - Reduced attack surface area, as keys and in-progress encryption + values are handled within the encryption hardware, rather than in + general memory, and hence cannot be accessed by an attacker with + physical access. Keys may still leave the cryptoprocessor, which + gives some attack surface. + + - Typical secure cryptoprocessors have tamper evidence features in the + hardware. + + - Typically hardware is FIPS-certified. + + - More expensive than software. + + - Provides a limited set of encryption algorithms, with no option to + upgrade them once it’s fixed in silicon. + + - No possibility to audit the hardware implementation to check for + backdoors, so you have to trust that the hardware vendor has not + been secretly required to provide a backdoor by some government. + + - Typical cryptoprocessors originate from mobile or embedded + networking hardware, both of which need to support TLS, and hence + cryptoprocessors typically have support for AES, DES, 3DES and SHA. + This is sufficient for accelerating the common cipher suites in TLS + and IPsec. + + - Have to be supported by the Linux kernel crypto API (`/dev/crypto`) in + order to be usable from software. + +### Hardware security module + + - Highest encryption bandwidth. + + - Minimal attack surface area, with keys never leaving the HSM. + + - All hardware is tamper-proof and tamper-evident, and typically can + destroy stored keys automatically if tampering is detected. + + - Hardware is almost universally FIPS-certified. + + - Most expensive. + + - Provides a range of encryption algorithms, but with no option to + upgrade them. + + - No possibility to audit the hardware implementation to check for + backdoors, so you have to trust that the hardware vendor has not + been secretly required to provide a backdoor by some government. + + - Some modules can handle encryption of network streams transparently, + taking a plaintext network stream as input and handling all TLS or + IPsec operations for it with peers. + +### Conclusion + +According to [one evaluation][cryptopp-eval], using encryption acceleration +instructions should reduce the number of cycles per byte for AES +encryption from 28 to 3.5. Assuming the inter-domain connection is being +used to transmit a HD video at 250kB·sâ»Â¹, that means +encryption requires 7MHz of CPU compute without acceleration, and 875kHz +with it. Performing symmetric encryption on a data stream doesn’t +significantly increase the required memory bandwidth compared to copying +the stream around without encryption. + +Hence, overall, if we assume a peak bandwidth requirement on the +inter-domain communications link on the order of +250kB·sâ»Â¹ then using software encryption with +acceleration instructions should give sufficient performance. + +The hardware security (tamper-proofing) provided by a HSM is overkill +for an in-vehicle system, and is better suited to data centres or +military equipment. We recommend either using software encryption with +acceleration, or a secure cryptoprocessor, depending on the balance of +the advantages and disadvantages of the two for the particular OEM and +vehicle. For the purposes of this design, both options provide all +features necessary for inter-domain communications. + +## Appendix: Audio and video streaming standards + +There are several standards to enable reliable audio and video streaming +between various systems. These standards aim to address the synchronization +problem with different approaches. + + - [AES67]: The AES67 standard combines PTP and RTP using PTP clock source + signalling ([RFC7273]) to synchronize multiple streams with an external clock, + focusing on high-performance audio based on RTP/UDP. + + - VSF TR-03: This is a technical recommendation from the [Video Service Forum](http://www.videoservicesforum.org/) (VFS). + The TR-03 standard is similar to AES67 in terms of using PTP for clock synchronization, + but it extends AES67 to cover other kinds of uncompressed streams, including + video and metadata. + + - [AVB]: The Audio Video Bridging (AVB) is a small extensions to standard layer-2 + MACs and bridges. An advantage of AVB is that the time synchronization information + is periodically exchanged through the network so it provides great synchronization + precision. However, it requires to implement AVB for all of devices in the network + because the device should allocate a fraction of network bandwith for AVB traffic. + +The following comparison table depicts the characteristics of the standards. + +| | AES67 | VSF TR-03 | AVB | +| --------------------------- |--------------- | -------------- | ------------------------ | +| Time synchronization | external (PTP) | external (PTP) | supported by the network | +| Kernel support | not required | not required | required | +| Transport protocol | RTP | RTP | RTP, HTTP(s), IEEE 1722 | +| Related open source project | GStreamer | N/A | OpenAvnu | + +Note that VFS TR-03 has no explicit open source implementation, but as it combines RTP +for transport and PTP for clock synchronization, it is generally supported by GStreamer. + +## Appendix: Multiplexing RTP and RTCP + +RTP requires the RTP Control Protocol (RTCP) to exchange control packets and timing information +such as latency and QoS. Usually RTP and RTCP use two different channels on different network +ports, but it is also possible to use a single port for both protocols using the [RFC 5761](https://tools.ietf.org/html/rfc5761) +standard, supported by the GStreamer `funnel` element. + +The following diagram shows how a RFC 5761 pipeline can be set up in GStreamer: + +--- +/--------\ /---------\ /--------\ /---------------\ /----------\ +| audio | === | audio | === | rtpbin | = rtp = | rtp payloader | = rtp = | | /----------\ +| source | | convert | | | \---------------/ | funnel | === | udp sink | +\--------/ \---------/ | | =========================== rtcp = | | \----------/ + \--------/ \----------/ +--- + +## Appendix: Audio and video decoding + +As a system which handles a lot of multimedia, deciding where to perform +audio and video decoding is important. There are two major +considerations: + + - minimising the amount of raw communications bandwidth which is + needed to transmit audio or video data between the domains; and + + - ensuring that an exploit does not give access to arbitrary memory + from either domain (especially not the automotive domain). + +The discussion below refers to video encoding and decoding, but the same +considerations apply equally well to audio. + +Software encoding is a large CPU burden, and introduces quality loss +into videos — so decoding and re-encoding videos in one domain to check +their well-formedness is not a viable option. If decoding is being +performed, the decoded output might as well be used in that form, rather +than being re-encoded to be sent to the other domain. + +In order to avoid spending a lot of CPU time and CPU–memory bandwidth on +video decoding, it should be performed by hardware. However, this +hardware does not necessarily have to be in the domain where the encoded +video originates. For example, it is entirely possible for videos to be +sent from the CE to be decoded in the AD. + +The original designs which were discussed in combination with the GPU +video sharing design planned to create a GStreamer plugin in the CE +which treats the AD as a hardware video decoder which accepts encoded +video, decodes it, and returns a handle which can be passed to the GL +scene being output by the CE, via a GL extension (similar to +[EXT_image_dma_buf_import][]). This is the same model as used for +‘normal’ hardware decoders, and ensures that decoded video data +remains within the AD, rather than being sent back over the inter-domain +communications link (which would incur a very high bandwidth cost, +which for uncompressed 1080p video in YUV 422 format at 60fps amounts to +16 bits∕pixel × (1920 × 1080) pixels∕frame × 60 frames/s = 1898 Mbit∕s = 237 MB∕s). + +Regarding security, a hardware decoder is typically a [DMA]-capable +peripheral which means that, unless constrained by an [IOMMU], it +can access all areas of physical memory. The threat here is that a +malicious or corrupt video could trigger the decoder into reading or +writing to areas of memory which it shouldn’t, which could allow it to +overwrite parts of the (hypervisor) operating system or running +applications. This concern exists regardless of which domain is driving +the decoder. We highly recommend that hardware is chosen which uses an +IOMMU to restrict the access a video decoder has to physical memory. + +Note that the same security threat applies to the GPU, which has direct +access to physical memory (if shared with the CPU — some systems use +dedicated memory for the GPU, in which case the issue isn’t present). +GPUs have a much larger attack surface, as they have to handle complex +GL commands which are provided from untrusted sources, such as WebGL. + +We recommend investigating the hardening and security applied to video +decoders on the particular hardware platforms in use, but there is not +much which can be done by software to improve their security if it is +lacking — the performance cost is too high. + +### Memory bandwith usage on the i.MX6 Sabrelite + +This section refers to some benchmarks evaluating the available memory +bandwidth on the i.MX6 Sabrelite platform used in the reference hardware for +Apertis. This data is very system dependent, but the order of magnitude should +provide a general guide for evaluating approaches. + +The [iMX6 memory bandwith usage benchmark](https://developer.ridgerun.com/wiki/index.php?title=IMX6_Memory_Bandwidth_usage) +describes some tools that can be used to measure how memory is used, and +reports that a +[1080p @ 60fps loopback pipline](https://developer.ridgerun.com/wiki/index.php?title=IMX6_Memory_Bandwidth_usage#1080p60_loopback) +using GStreamer requires up to 1744.46 MB/s of memory bandwidth. + +Another useful benchmark is the one evaluating +[the cost of memory copies](https://community.nxp.com/thread/309197) done with +the `memcpy()` function. The effective usable memory bandwidth measured with +this test amounts to roughly 800 MB/s. + +### Security Vulnerabilities in GStreamer + +To list vulnerabilities by type we can refer to the statistics available +from the [CVE](http://cve.mitre.org/) data source. + +According to the [CVE Details](https://www.cvedetails.com) website, +a third party that provides summaries of CVE vulnerabilities, +GStreamer had [total 17 vulnerabilities](https://www.cvedetails.com/vendor/9481/Gstreamer.html) since 2009. + +Examining the DoS and Code Execution vulnerability types, the statistics +showed different characteristics for demuxers and decoders. There have been +12 DoS vulnerabilities affecting demuxers, but only one issue could lead +to Code Execution. For decoders, all the the 5 DoS issues which were found +can be escalated to Code Execution as well. + +This report indicates that demuxers might have a smaller attack surface than +decoders from the arbitrary code execution viewpoint. However, it is also +possible to have a security hole similar to [][Video or audio decoder bugs]. + +Both demuxing and possibly even decoding in the CE can help to mitigate the +described attacks. If the CE is responsible of demuxing the AD does not +need to deal with content detection and container formats, and the CE provides +a kind of partial verification of the stream even without decoding it. + +Decoding in the CE poses some challenges in terms of bandwidth, as the amount +of data generated by fully decoded video streams is very high. It's not going +to be a viable solution on ethernet-based setups, and advanced zero-copy mechanisms +to trasfer frames are recommended on single board setups (virtualised or container-based). + +[conc-dis-sys]: https://www.cl.cam.ac.uk/teaching/1516/ConcDisSys/materials.html + +[trusted path]: https://en.wikipedia.org/wiki/Trusted_path + +[bandwidth management]: https://en.wikipedia.org/wiki/Bandwidth_management + +[Security concept design]: https://wiki.apertis.org/ConceptDesigns + +[ucam-cl-tr-630]: http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-630.html + +[Conditional Access design]: https://wiki.apertis.org/Conditional_Access + +[GAsyncResult]: https://developer.gnome.org/gio/stable/GAsyncResult.html + +[IPsec]: https://en.wikipedia.org/wiki/IPsec + +[TLS]: https://en.wikipedia.org/wiki/Transport_Layer_Security + +[crypto-eval]: https://www.schneier.com/cryptography/archives/2003/12/a_cryptographic_eval.html + +[RFC 4301]: https://tools.ietf.org/html/rfc4301 + +[RFC 6040]: https://tools.ietf.org/html/rfc6040 + +[RFC 7619]: https://tools.ietf.org/html/rfc7619 + +[RFC 5077]: https://tools.ietf.org/html/rfc5077 + +[head of line blocking]: https://en.wikipedia.org/wiki/Head-of-line_blocking + +[loopback interface]: https://en.wikipedia.org/wiki/Loopback#Virtual_loopback_interface + +[neighbour discovery]: https://en.wikipedia.org/wiki/Secure_Neighbor_Discovery + +[Heartbleed]: https://en.wikipedia.org/wiki/Heartbleed + +[socket transport]: http://dbus.freedesktop.org/doc/dbus-specification.html#transports-tcp-sockets + +[dbus-stability]: http://dbus.freedesktop.org/doc/dbus-specification.html#stability + +[DNS-SD]: https://en.wikipedia.org/wiki/Zero-configuration_networking#DNS-SD + +[SSDP]: https://en.wikipedia.org/wiki/Simple_Service_Discovery_Protocol + +[Traffic control]: https://en.wikipedia.org/wiki/Network_traffic_control + +[linux-traffic-control]: http://tldp.org/HOWTO/Traffic-Control-HOWTO/intro.html + +[well-known names]: http://dbus.freedesktop.org/doc/dbus-specification.html#message-protocol-names-bus + +[dbus-api-versioning]: http://dbus.freedesktop.org/doc/dbus-api-design.html#api-versioning + +[Debug and Logging design]: https://wiki.apertis.org/mediawiki/index.php/ConceptDesigns + +[Bustle]: http://willthompson.co.uk/bustle/ + +[dbus-monitor]: http://dbus.freedesktop.org/doc/dbus-monitor.1.html + +[ws-decrypt]: https://ask.wireshark.org/questions/12019/how-can-i-decrypt-ikev1-andor-esp-packets + +[ws-decrypt-plugin]: https://ask.wireshark.org/questions/44562/update-decryption-table-from-lua + +[security-seal]: https://en.wikipedia.org/wiki/Security_seal + +[trusted platform module]: https://en.wikipedia.org/wiki/Trusted_Platform_Module + +[dbus-spec]: http://dbus.freedesktop.org/doc/dbus-specification.html + +[AFLv2.1]: https://spdx.org/licenses/AFL-2.1.html + +[GPLv2]: http://spdx.org/licenses/GPL-2.0+ + +[LGPLv2.0]: http://spdx.org/licenses/LGPL-2.0+ + +[will-dbus-perf]: https://desktopsummit.org/sites/www.desktopsummit.org/files/will-thompson-dbus-performance.pdf + +[dbus-signal-perf]: http://blog.asleson.org/index.php/2015/09/01/d-bus-signaling-performance/ + +[ipc-perf]: https://blogs.gnome.org/abustany/2010/05/20/ipc-performance-the-return-of-the-report/ + +[dbus-python]: http://www.freedesktop.org/wiki/Software/DBusBindings/ + +[kdbus]: http://www.freedesktop.org/wiki/Software/systemd/kdbus/ + +[DBus API design]: http://dbus.freedesktop.org/doc/dbus-api-design.html + +[dbus-fd-handles]: http://dbus.freedesktop.org/doc/dbus-specification.html#idp9446907251 + +[AES instruction set]: https://en.wikipedia.org/wiki/AES_instruction_set + +[CLMUL]: https://en.wikipedia.org/wiki/CLMUL_instruction_set + +[ARM cryptography extensions]: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0514g/index.html + +[secure-cryptoprocessor]: https://en.wikipedia.org/wiki/Secure_cryptoprocessor + +[hw-secu-module]: https://en.wikipedia.org/wiki/Hardware_security_module + +[FIPS]: https://en.wikipedia.org/wiki/FIPS_140-2 + +[gnutls-cert]: http://www.gnutls.org/manual/html_node/Certification.html + +[openssl-cert]: https://www.openssl.org/docs/fipsvalidation.html + +[rhel-cert]: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Security_Guide/sect-Security_Guide-Federal_Standards_And_Regulations-Federal_Information_Processing_Standard.html + +[ncc]: http://www.zdnet.com/article/ncc-group-to-audit-openssl-for-security-holes/ + +[AES-GCM]: https://en.wikipedia.org/wiki/Advanced_Encryption_Standard + +[cryptopp-eval]: https://groups.google.com/forum/msg/cryptopp-users/5x-vu0KwFRk/CO8UIzwgiKYJ + +[EXT_image_dma_buf_import]: https://www.khronos.org/registry/egl/extensions/EXT/EGL_EXT_image_dma_buf_import.txt + +[DMA]: https://en.wikipedia.org/wiki/Direct_memory_access + +[IOMMU]: https://en.wikipedia.org/wiki/Input-output_memory_management_unit + +[RTP]: https://en.wikipedia.org/wiki/Real-time_Transport_Protocol + +[RTCP]: https://en.wikipedia.org/wiki/RTP_Control_Protocol + +[NTP]: https://en.wikipedia.org/wiki/Network_Time_Protocol + +[PTP]: https://en.wikipedia.org/wiki/Precision_Time_Protocol + +[AES67]: https://en.wikipedia.org/wiki/AES67 + +[AVB]: https://en.wikipedia.org/wiki/Audio_Video_Bridging + +[RFC7273]: https://tools.ietf.org/html/rfc7273 + +[dbus-sharp]: https://github.com/mono/dbus-sharp + +[GDBus]: https://developer.gnome.org/gio/stable/gdbus.html + +[pydbus]: https://github.com/LEW21/pydbus + +[QtDBus]: http://doc.qt.io/qt-5/qtdbus-index.html + +[sd-bus]: https://github.com/systemd/systemd/blob/master/src/systemd/sd-bus.h + +[node-dbus]: https://github.com/sidorares/node-dbus + +[libdbus]: https://dbus.freedesktop.org/doc/api/html/ + +[man 2 link]: https://manpages.debian.org/stretch/manpages-dev/link.2.en.html + +[man 2 copy_file_range]: https://manpages.debian.org/stretch/manpages-dev/copy_file_range.2.en.html diff --git a/content/designs/internationalization.md b/content/designs/internationalization.md new file mode 100644 index 0000000000000000000000000000000000000000..46e6afb1650008190fbd59e79df56334e6253c0d --- /dev/null +++ b/content/designs/internationalization.md @@ -0,0 +1,803 @@ +--- +title: Internationalization +short-description: Internationalization and localization in Apertis + (partially-implemented, no support for switching applications + without restarting them) +authors: + - name: Tomeu Vizoso + - name: Philip Withnall +--- + +# Internationalization + +## Introduction + +This design explains how the Apertis platform will be made localizable +and how it will be localized to specific locales. + +“Internationalization†(“i18nâ€) is the term used for the process of +ensuring that a software component can be localized. “Localization†+(“l10nâ€) is the process of adding the necessary data and configuration +so an internationalized software adapts to a specific locale. A locale +is the definition of the subset of a user's environment that depends on +language and cultural conventions. + +All this will be done with the same tools used by GNOME and we do not +anticipate any new development in the middleware itself, though UI +components in the Apertis shell and applications will have to be +developed with internationalization in mind, as explained in this +document. + +For more detailed information of how translation is done in the FOSS +world, a good book on the subject is [available][open-translation-tools]. + +## Internationalization + +### Text input + +Some writing systems will require special software support for entering +text, the component that provides this support for an specific writing +system is called input method. There is a framework for input methods +called [IBus] that is the most common way of providing input methods +for the different writing systems. Several input methods based on IBus +are available in Ubuntu, and it is very unlikely that any needs will not +be covered by them. An older, but more broadly-supported, input method +framework is [SCIM] and an even older one is [XIM]. + +The advantage of using an input method framework (instead of adding the +functionality directly to applications or widget libraries) is that the +input method will be usable in all the toolkits that have support for +that input method framework. + +Note that currently there is almost no support in Clutter for using +input methods. Lead Clutter developer Emmanuele Bassi recommends doing +something similar to GNOME Shell, which uses [`GtkIMContext`][GtkIMContext] on +top of +[`ClutterText`][StEntry], which would imply depending on GTK+. There's a project +called clutter-imcontext that provides a simple version of GtkIMContext +for use in Clutter applications, but Emmanuele strongly discourages its +use. GTK+ and Qt support XIM, SCIM and IBus. + +In order to add support for GtkIMContext to ClutterText, please see how +it's done in [GNOME Shell][st-im-text]. As can be seen this implementation calls +the following functions from the [GtkIMContext] API: + + - `gtk_im_context_set_cursor_location` + - `gtk_im_context_reset` + - `gtk_im_context_set_client_window` + - `gtk_im_context_filter_keypress` + - `gtk_im_context_focus_in` + - `gtk_im_context_focus_out` + +Between the code linked above and the GTK+ API reference it should be +reasonably clear how to add GtkIMContext support to Clutter +applications, but there's also the possibility of reusing that code +instead of having to rewrite it. In that case, we advise to take into +account the license of the file in question (LGPL v2.1). + +For systems without a physical keyboard, text can be entered via a +virtual keyboard. The UI toolkit will invoke the on-screen keyboard when +editing starts, and will receive the entered text once it has finished. +So the on-screen keyboard can be used for text input by a wide variety +of UI toolkits, Collabora recommends it to use IBus. + +The reasons for recommending to use an input-method framework is that +most toolkits have support for it, so if an application is reused that +uses Qt, the on-screen keyboard will be used without any specific +modification, which wouldn't be the case if `GtkIMContext` would be used. + +About why to use IBus over other input-method frameworks, the reason is +that IBus is already supported by most modern toolkits, has a very +active upstream community and the cost of developing input-methods with +IBus is lower than with other frameworks. Currently, IBus is the default +input method framework in Ubuntu and Fedora, and GNOME is considering +dropping support for other frameworks’ input methods. + +### Text display + +For text layout and rendering the toolkit needs to support all writing +systems we are interested in. GTK+ and Clutter use Pango which supports +a very broad set of natural language scripts. The appropriate fonts need +to be present so Pango can render text. + +The recommended mechanism for translating those pieces of text that are +displayed in the UI is to export those strings to a file, get them +translated in additional files and then have the application use at +runtime the appropriate translated strings depending on the current +locale. GNU gettext implements this scheme and is very common in the +FOSS world. Gettext also allows adding a comment to the string to be +translated, so it gives more context that can aid the translator to +understand better how the string is used in the UI. This additional +context can also be used to encode additional information as explained +later. The GNU [gettext] manual is comprehensive and covers all this in +detail. + +This is an example of all the metadata that a translated string can have +attached: + +--- +#. Make sure you use the IEC equivalent for your language +## Have never seen KiB used in our language, so we'll use KB +#: ../glib/gfileutils.c:2007 +#, fuzzy, c-format + +msgctxt "File properties dialog" +msgid "%.1f KiB" +msgstr "%.1f KB" +--- + +For strings embedded inside [ClutterScript] files, ClutterScript supports a +`translatable` property to mark the string as translatable. So to mark the text +of a `ClutterText` as translatable, the following ClutterScript should be used: + +``` json +"label" : { + "text" : { + "translatable" : true, + "string" : "Label Text" + } +} +--- + +Note that `clutter_script_set_translation_domain()` or +[`textdomain()`][textdomain] needs to be called before translatable strings can +be used in a ClutterScript file. + +[gettext] currently does not support extracting strings from ClutterScript +files; support for that needs to be added. + +Previous versions of this document recommended using [intltool]. However, in +recent years, it has been superceded by [gettext]. Previously, gettext was +unmaintained, and intltool was developed to augment it; now that gettext is +actively maintained and gaining new features, intltool is no longer necessary. + +#### Message IDs + +It is most common in FOSS projects (specially those using GNU gettext) +to use the English translation as the identifier for the occurrence of a +piece of text that needs to be translated, though some projects use an +identifier that can be numeric (`T54237`) or a mnemonic (`PARK_ASSIST_1`). +The IDs will not leak to the UI if the translations are complete, and +there is also the possibility of defining a fallback language. + +There's two main arguments used in favor of using something other than +plain English as the ID: + + - so that when the English translation is changed in a trivial way, + that message isn't marked as needing review for all other languages; + - and to avoid ambiguities, as “Stop†may refer to an action or a + state and thus may be translated differently in some languages, + while using the IDs `state_stop` and `action_stop` would remove + that ambiguity. + +When using gettext, the first argument loses some strength as it +includes a tool that is able to merge the new translatable string with +the existing translations, but marking them as in need of review. About +the argument of avoiding ambiguity, GNU gettext was extended to provide +a way of attaching additional context to a message so that is not a +problem anymore. + +Regarding advantages of using plain English (or other natural language) +as the message ID: + + - better readability of the code, + - when the developers add new messages to the application and run it, + they will see the English strings which is closer to what the user + will see than any other kind of IDs. + +From the above it can be understood why it's normally recommended to +just use the English translation as the placeholder in the source code +when using GNU gettext. + +Regarding consistency, there's a slight advantage in using natural +language strings because when entering translations the translation +software may offer suggestions from the translation memory and given +that the mnemonic IDs are likely to be unique, there will be less exact +matches. + +Because of the need to associate to each translation metadata such as +the font size and the available space, plus having product variants that +share most of the code but can have differences in fonts and widget +sizes, we recommend to use mnemonics as IDs, which would allow us to +keep a list of the translatable strings and their associated fonts and +pixels for each variant. This will be further discussed in [][Testing]. + +This diagram illustrates the workflow that would be followed during +localization. + + + +For better readability of the source code we recommend that the IDs +chosen suggest the meaning of the string, such as *PARK\_ASSIST\_1*. +Instead of having to specify whole font descriptions for each string to +translate, Collabora recommends to use styles that expand to specific +font descriptions. + +Here is an example of such a metadata file, note the font styles `NORMAL`, +`TITLE` and `APPLICATION_LIST`: + +--- +PARK_ASSIST_1 NORMAL 120px +PARK_ASSIST_2 NORMAL 210px +SETTINGS_1 TITLE 445px +BROWSER APPLICATION_LIST 120px +--- + +And here is the PO file that would result after merging the metadata in, +ready to be uploaded to Transifex: + +--- +#. NORMAL,120px +#: ../preferences.c:102 +msgid "PARK_ASSIST_1" +msgstr "Park assist" +#. NORMAL,210px +#: ../preferences.c:104 +msgid "PARK_ASSIST_2" +msgstr "Park assist" +--- + +If for some reason some source code is reused that uses English for its +translation IDs and the rest of the application or library uses +synthetic IDs, Collabora recommends to have a separate domain for each +section of the code, so all English IDs are in their own PO file and the +synthetic IDs in their own. In this case, note that matching metadata to +individual strings can be problematic if the metadata isn't updated when +the string IDs change. It will be a problem as well if there are several +occurrences of exactly the same string. + +When it is needed to modify the metadata related to existing strings, +the process consists of modifying the file containing string metadata, +then merging it again with the PO files from the source code and +importing it into the translation management system. + +#### Consistency + +Translation management systems offer tools to increase the consistency +of the translations, so the same words are used to explain the same +concept. One of the tools that Transifex offers is a search feature that +allows to quickly check how a word has been translated in other +instances. Another is the *translation memory* feature, which suggests +translations based on what has been translated already. + +There isn't any relevant difference in how these tools work and whether +the strings are identified by synthetic IDs or by their English +translations. + +### UI layout + +Some languages are written in orientations other than left to right and +users will expect that the UI layout takes this into account. This means +that some horizontal containers will have to layout its children in +reverse order, labels linked to a widget will also be mirrored, and some +images used in icons will have to be mirrored horizontally as well. + +Here is an example of an application running under a locale whose +orientation is right-to-left, note the alignment of icons in the toolbar +and the position of the arrows in submenus: + + + +## Localization + +### Translation + +#### GNU gettext + +Most of the work happens in the translation phase, in which .po files +are edited so they contain appropriate translations for each string in +the project. As illustrated in the diagram below, the .po files +generated from the original .pot file serve as the basis for starting +the translation. When the source code changes and thus a different .pot +file gets generated, GNU gettext includes a tool for merging the new +.pot file into the existing .po files so translators can work on the +latest code. + +This diagram illustrates the [workflow][gnu-gettext-process] when using GNU gettext to +translate text in an application written in C: + + + +From time to time, it is needed to extract new translatable strings from +the source code and update the files that are used by translators. The +extraction itself is performed by the tool [xgettext], which +generates a new POT file containing all the translatable strings plus +their locations in the source code and any additional context. + +These are the [programming languages][gettext-languages] +supported by GNU gettext: C, +C++, ObjectiveC, PO, Python, Lisp, EmacsLisp, librep, Scheme, Smalltalk, +Java, JavaProperties, C\#, awk, YCP, Tcl, Perl, PHP, GCC-source, +NXStringTable, RST and Glade. + +The POT file and each PO file are fed to [msgmerge] which merges the +existing translations for that language into the POT file. Strings that +haven't been changed in the source code get automatically merged and the +remaining are passed through a fuzzy algorithm that tries to find the +corresponding translatable string. Those strings that had a fuzzy match +are marked as needing review. If strings are indexed with unique IDs +instead of the English translation, then it's recommended to use the +--no-fuzzy-matching option to msgmerge, so new IDs will be always empty. +Otherwise, if the POT file contained already an entry for +`PARK_ASSIST_1` and `PARK_ASSIST_2` was added, when merging into +existing translations, the existing translation would be reused, but +marking the entry as fuzzy (which would cause Transifex to use that +translation as a suggestion). + +#### Translation management + +Though these file generation steps can be executed manually with command +line tools and translators can work directly on the `.po` files with any +text editor, there are more high-level tools that aim to manage the +whole translation process. Next we briefly mention the ones most +commonly used in FOSS projects. + +[Pootle], [Transifex] and Launchpad [Rosetta] are tools +which provide convenient UIs for translating strings. They also +streamline the process of translating strings from new `.pot` versions and +offer ways to transfer the resulting .po files to source code +repositories. + +Pootle is the oldest web-based translation management system and is +mature but a bit lacking in features. Maintaining an instance requires a +fair amount of experience. + +Transifex is newer and was created to accommodate better than Pootle to +the actual workflows of most projects today. Its UI is richer in +features that facilitate translation and, more importantly, has good +commercial support (by [Indifex]). It provides as well an API that +can be used to integrate it with other systems. See [][Transifex] for more +details. + +Launchpad is not easily deployable outside launchpad.net and is very +oriented to Ubuntu's workflow, so we do not recommend its usage. + +Both Pootle and Transifex have support for translation memory, which +aids in keeping the translation consistent by suggesting new +translations based on older ones. + +If for some reason translators prefer to use a spreadsheet instead of +web UIs or manually editing the PO files, [csv2po] will convert a PO +file to a spreadsheet and will convert it back so the translation system +can be refreshed with the new translations. + +*po2csv* will convert a PO file to a CSV one which has a column for the +comments and context, another for the *msgid* and one more for the +translation for the given language. *csv2po* will do the opposite +conversion. + +It's very likely that the CSV format that these tools generate and +expect doesn't match exactly what it is needed, so an additional step +will be needed that converts the CSV file to the spreadsheet format +required, and a step that does the opposite. + +#### Transifex + +In this section we discuss in more details some aspects of Transifex. +For an overview on other features of Transifex, please see the +documentation for [management][transifex-management] +and [translation][transifex-translation]. + +##### Deployment options + +Transifex is available as a hosted web service in +[*http://www.transifex.net*](http://www.transifex.net/) and there are +several [pricing options][transifex-plans] depending on the project size, features +and level of technical support desired. + +The FOSS part of Transifex is available as Transifex Community Edition +and can be freely downloaded and installed in any machine with a +minimally modern and complete Python installation. This version lacks +some of the features that are available in +[*http://transifex.net*](http://transifex.net/) and in the Enterprise +Edition. The installation manual for the community edition is in +[*http://help.transifex.net/server/install.html*](http://help.transifex.net/server/install.html). + +The hosted and the enterprise editions support these features in +addition of what the community edition supports: + + - Translation memory + - Glossary + - Improved collaboration between translators + - Improved UI theme + +The advantage of the hosted edition is that it is updated more +frequently (weekly) and that in the future it will be possible to order +paid translations through the platform. + +Transifex currently cannot estimate the space that a given translation +will take and will need to be extended in this regard. + +It also fully supports using synthetic translation IDs instead of +English or other natural language. + +Finally, Indifex provides commercial support for the enterprise edition +of Transifex, which can either be self-hosted or provided as SaaS. Their +portfolio includes assistance with deployment, consultancy services on +workflow and customization, and a broad package of [technical support][transifex-support]. + +##### Maintenance + +Most maintenance is performed through the web interface, by registered +users of the web service with the appropriate level of access. This +includes setting up users, teams, languages and projects. Less frequent +tasks such as instance configuration, software updates, performance +tuning and set up of automatic jobs are performed by the administrator +of the server hosting the service. + +##### Translation memory + +Transifex will provide suggestions when translating a string based on +existing [translations][transifex-translation-tm] in the current module or in other modules +that were configured to share their translation [memory][transifex-memory]. This +memory can also be used to pre-populate translations for a new module +based on other modules' [translations][transifex-prepopulate]. + +##### Glossary + +Each project has a series of terms that are very important to translate +consistently or that can have several different possible translations +with slightly different meanings. To help with this, Transifex provides +a [glossary][transifex-glossary] that will assist translators in these cases. + +##### POT merging + +As explained in [][GNU Gettext], new translatable strings are extracted +from the source files with the tool xgettext and the resulting POT file +is merged into each PO file with the tool msgmerge. + +Once the PO files have been updated, the tool tx (command-line transifex +client) can be used to submit the changes to the server, this merge +happening as [follows][transifex-push]: + +Here’s how differences between the old and new source files will be +handled: + + - New strings will be added. + - Modified strings will be considered new ones and added as well. + - Strings which do not exist in the new source file (including ones + which have been modified) will be removed from the database, along + with their translations. + +Keep in mind, however, that old translations are kept in the +Translation Memory of your project. + +Note that this process can be automated. + +##### Automatic length check + +Transifex's database model will have to be updated to store additional +metadata about each string such as the font description and the +available size in pixels. The web application could then check how many +pixels the entered string would take in the UI, using Pango and +[Fontconfig]. For better accuracy, the exact fonts that will be used +in the UI should be used for this computation. + +Alternatively, there could be a extra step after each translation phase +that would spot all the strings that may overflow and mark them as +needing review. + +### Testing + +Translations will be generally proof-read, but even then we recommend +testing the translations by running the application to catch a number of +errors which are noticeable only at run time. This run-time evaluation +can spot confusing or ambiguous wording, as well as layout problems. + +Each translation of a single piece of text can potentially require a +wildly-differing width due to varying word and expression sizes in +different languages. There are ways for the UI to adapt to the different +string sizes but there are limits to how well this can work, so +translators need often to manually check whether their translation fits +nicely in the UI. + +One way to automatically avoid many instances of layout errors would be +to have available, during translation and along with the extracted +strings, the available space in pixels and the exact font description +used to display the string. This information would allow automatic +calculation of string sizes, thus being able to catch translations that +would overflow the boundaries. As explained in [][Message IDs], this metadata +would be stored in a file indexed by translation ID and would be merged +before importing it into the translation management software, which +could use it to warn when a translated string may be too long. For this +to consistently work, the translation IDs need to be unique (and thus +synthetic). + +When calculating the length of a translation for a string that contains +one or more [printf placeholders], the width that the string can +require when displayed in the UI grows very quickly. For example, for +the placeholder `%d` which can display a 32-bit integer value, the final +string can take up to 10 additional digits. The only way to be safe is +to assume that each placeholder can be expanded to its maximum size, +though in the case of strings (placeholder `%s`) that is practically +unlimited. + +If, despite automatically warning the translator when a translation will +not fit in the UI, some strings are too long, the UI widget that +displays the string could ellipsize it to indicate that the displayed +text isn't complete. If this occurred in a debug build, a run-time +warning could be also emitted. These warnings would be logged only once +a translated string has been displayed in the UI and wouldn't apply to +text coming from an external input. + +For manual testing, an image could be provided to translators so they +could easily merge their work and test the software in their locale. + +### Other locale configuration + +There is some other configuration that is specific to a locale but that +is not specific to the application. This includes number, date and time +formats, currency and collation. Most locales are already present in GNU +glibc so we would only have to add a locale if it would target an +extremely small population group. + +## Distribution + +There are three main ways of packaging translations: + + - package all the MO files (compiled PO files) along the rest of the + files for a single component (for example gnome-shell in Ubuntu). + - package the MO files for a single component (usually a big one such + as LibreOffice or KDE) and a specific language in a separate package + (for example, [firefox-locale-de] in Ubuntu). + - package several MO files corresponding to several components for one + language (for example language-pack-cs-base in Ubuntu). + +Our recommendation at this stage is to have: + + - each application along with all its existing translations in a + single package. This way the user will install e.g. + `navigation-helper_1.10_armhf.deb` and the user will be able to + switch between all the supported languages without having to install + any additional packages. + - the rest of the MO files (those belonging to the UI that is + pre-installed, such as applications and the shell) would be packaged + grouped by language, e.g. `apertis-core-de_2.15_armhf.deb`. That way + we can choose which languages will be pre-installed and can allow + the user to install additional languages on demand. + +If we do not want to pre-install all the required fonts and input +methods for all supported languages, we could have meta-packages that, +once installed, provide everything that is required to support a +specific language. The meta-package in Ubuntu that provides support for +Japanese is a good example of [this][ubuntu-language-ja]. + +Note that our current understanding is that the whole UI will be +written, not reusing any existing UI components that may be present in +the images. This implies that though some middleware components may +install translations, those are not expected to be seen by the user +ever. + +This table should help make an idea of the sizes taken by packages +related to localization: + +| Package name | Contents | Package size | Installed size | +| ------------------------- | ------------------------------------------------ | ------------ | -------------- | +| language-pack-de-base | MO files for core packages1 | 2,497 kB | 8,432 kB | +| firefox-locale-de | German translation for Firefox2 | 233 kB | 453 kB | +| libreoffice-l10n-de | Resource files with translations, and templates3 | 1,498 kB | 3,959 kB | +| language-support-fonts-ja | Fonts for rendering Japanese | 29,006 kB | 41,728 kB | +| Ibus-anthy | Japanese input method4 | 388 kB | 1,496 kB | + +The *language-support-fonts-ja* package is a virtual one that brings the +following other packages (making up the total of 41,728 kB when +installed): + +| Package name | Contents | Package size | Installed size | +| ----------------- | ----------------------------------------------- | ------------ | -------------- | +| ttf-takao-gothic | Japanese TrueType font set, Takao Gothic Fonts | 8,194.6 kB | 12,076.0 kB | +| ttf-takao-pgothic | Japanese TrueType font set, Takao P Gothic Font | 4,195.4 kB | 6,196.0 kB | +| ttf-takao-mincho | Japanese TrueType font set, Takao Mincho Fonts | 16,617.9 kB | 23,456.0 kB | + +Modern distributions will bring all those fonts for Japanese-enabled +installations, but depending on the commercial requirements, a system +could make with just a subset. Similarly, other locales will require a +set of fonts for properly rendering text in the same way as users in +specific markets expect. In order to recommend specific font files, +knowledge on the requirements are needed. + +## Runtime switching of locale + +### Common pattern + +A usual way of implementing switching languages during runtime is to +have those UI components that depend on the language to listen for a +signal that gets emitted by a global singleton when the language +changes. Those components will check the new language and update strings +and probably change layout if the text direction has changed. Some other +changes may be needed such as changing the icons, colors, etc. + +The Qt toolkit has a bit of support for this solution and their +[documentation][qt-language-switching] explains in detail how to implement it. This can be +easily implemented in Clutter and performance should be good provided +that there isn't an excessive amount of actors in the stage. + + + +`LocaleManager` in the diagram would be a singleton that stores the +current locale and notifies interested parties when it changes. The +current locale would be changed by UI elements such as a combo-box in +the settings panel, a menu option, etc. + +Other UI elements that take locale-dependent decisions (in the diagram, +`SettingsWindow`) would register to be notified when the locale changes, +so they can change their UI (update strings, change icons, change text +orientation, etc.). + +Since systemd version 30, the [systemd-localed service][systemd-localed] has +been provided as a standard D-Bus API (`org.freedesktop.locale1`) for managing +the system’s locale, including being notified when it is changed, getting its +current value, and setting a new value. This should be used in combination with +the `org.gnome.system.locale` GSettings schema, which stores the *user’s* +locale preferences. We suggest that the `LocaleManager` from the diagram is +implemented to query `org.gnome.system.locale` and returns the value of its +`region` setting if set. If not set, the user is using the default system +locale, which `LocaleManager` should query from `org.freedesktop.locale1`. + +`org.freedesktop.locale1` is provided as a D-Bus API only, and +`org.gnome.system.locale` is a GSettings schema. They are accessed differently, +so a set of wrapper functions should be written as a convenience for +application developers. + +systemd-localed uses [polkit] to authorise changes to the system locale, so +vendors would need to write a policy which determines which applications are +permitted to change the system locale, and which are allowed to query it. The +default should be that only the system preferences application is allowed to +change the locale; and all applications are allowed to query it (and be +notified of changes to the locale). + +These snippets show how systemd-localed could be used by an application +(omitting asynchronous calls for simplicity): + +The following example shows how the user’s locale can be queried by an +application, first checking `org.gnome.system.locale`, then falling back to +`org.freedesktop.locale1` if the user is using the system locale. It is +expected that most of the code in this example would be implemented in the +`LocaleManager`, rather than being reimplemented in every application. + +{{ ../examples/locale-region-changed.c }} + +### Application helper API + +To reduce the amount of work that most application authors will have +when making their applications aware of runtime locale switches, we +recommend that the SDK API includes a subclass of `ClutterText` (let's +call it `ExampleText`) that reacts to locale changes. + +`ExampleText` would accept a translatable ID via the function +`example_text_set_text()`, would display its translation based on the +current locale and would also listen for locale changes and update +itself accordingly. + +So xgettext can extract the string IDs that get passed to `ExampleText`, +it would have to be invoked with +`--flag=example_text_set_text:1:c-format`. + +If applications use `ExampleText` instead of `ClutterText` for the display +of all their translatable text, they will have to interface with +`LocaleManager` only if they have to localize other aspects such as icons +or container orientation. + +## Localization in GNOME + +GNOME uses a web application called Damned Lies to manage their +translation work-flow and produce statistics to monitor the translation +progress. Damned Lies is specifically intended to be used within GNOME, +and its maintainers recommend other parties to look into a more generic +alternative such as Transifex. There used to be a separate tool called +Vertimus but it has been merged into Damned Lies. + +Participants in the translation of GNOME belong to translation teams, +one for each language to which GNOME is translated, and they can have +one of three roles: translator, reviewer and committer. As explained in +GNOME's [wiki][gnome-translation-contribute]: + +> *Translators contains persons helping with GNOME translations into a +> specific language, who added themselves to the translation team. +> Translators could add comment to a specific PO file translation, could +> reserve it for translations and could suggest new translations by +> upload a new PO file. The suggested translations will be reviewed by +> other team members.* +> +> *Reviewers are GNOME translators which were assigned by the team +> coordinator to review newly suggested translations (by translators, +> reviews or committers). They have access to all actions available to a +> translators with the addition of some reviewing task (ex reserve a +> translation file for proofreading, mark a translation as being ready +> to be included in GNOME).* +> +> *Committers are people with rights to make changes to the GNOME +> translations that will be release. Unless a translations is not +> committed by a committer, it will only remain visible in the web +> interface, as an attached PO file. Committers have access to all +> actions of a reviewer with the addition of marking a PO file as +> committed and archiving a discussion for new suggestions.* + +The GNOME work-flow is characterized by everybody being able to suggest +translations, by having a big body of people who can review those and by +tightly controlling who can actually commit to the repositories. The +possibility of reserving translations also minimize the chances of +wasting time translating the same strings twice. + +A very popular tool in the GNOME community of translators is the tool +[Poedit], though the work-flow does not encourage a specific tool +for the translations themselves and GNOME translators do use several +tools depending on their personal preferences. + +This graph illustrates their work-flow: + + + +[open-translation-tools]: http://en.flossmanuals.net/open-translation-tools/ + +[IBUS]: http://en.wikipedia.org/wiki/Intelligent_Input_Bus + +[SCIM]: http://en.wikipedia.org/wiki/Smart_Common_Input_Method + +[XIM]: http://www.x.org/releases/X11R7.6/doc/libX11/specs/XIM/xim.html + +[GtkIMContext]: http://developer.gnome.org/gtk/unstable/GtkIMContext.html#GtkIMContext.description + +[StEntry]: http://developer.gnome.org/st/3.3/StEntry.html + +[st-im-text]: http://git.gnome.org/browse/gnome-shell/tree/src/st/st-im-text.c + +[gettext]: http://www.gnu.org/software/gettext/manual/gettext.html + +[intltool]: https://launchpad.net/intltool/ + +[gnu-gettext-process]: http://upload.wikimedia.org/wikipedia/commons/0/05/GNU_gettext_process.png + +[xgettext]: http://www.gnu.org/savannah-checkouts/gnu/gettext/manual/html_node/xgettext-Invocation.html + +[gettext-languages]: http://www.gnu.org/savannah-checkouts/gnu/gettext/manual/html_node/xgettext-Invocation.html#index-supported-languages_002c-_0040cod + +[msgmerge]: http://www.gnu.org/savannah-checkouts/gnu/gettext/manual/html_node/msgmerge-Invocation.html + +[Pootle]: http://translate.sourceforge.net/wiki/pootle/index + +[Transifex]: https://www.transifex.net/ + +[Rosetta]: https://translations.launchpad.net/ + +[Indifex]: http://www.indifex.com/ + +[csv2po]: http://translate.sourceforge.net/wiki/toolkit/csv2po + +[transifex-management]: http://help.transifex.net/intro/projects.html + +[transifex-translation]: http://help.transifex.net/intro/translating.html + +[transifex-plans]: https://www.transifex.net/plans/ + +[transifex-support]: https://www.transifex.net/tour/products/transifexee/ + +[transifex-translation-tm]: http://help.transifex.net/intro/translating.html#user-tm + +[transifex-memory]: http://help.transifex.net/intro/projects.html#setting-up-translation-memory + +[transifex-prepopulate]: http://help.transifex.net/intro/projects.html#prepopulate-translations-with-100-matches-on-tm + +[transifex-glossary]: http://help.transifex.net/intro/translating.html#glossary + +[transifex-push]: http://help.transifex.net/features/client/index.html#push + +[Fontconfig]: http://fontconfig.org/ + +[printf placeholders]: http://pubs.opengroup.org/onlinepubs/9699919799/functions/printf.html + +[firefox-locale-de]: http://packages.ubuntu.com/oneiric/firefox-locale-de + +[ubuntu-language-ja]: http://packages.ubuntu.com/hardy/language-support-ja + +[qt-language-switching]: http://developer.qt.nokia.com/faq/answer/how_can_i_dynamically_switch_between_languages_in_my_application_using_e.g_ + +[gnome-translation-contribute]: https://live.gnome.org/TranslationProject/ContributeTranslations + +[Poedit]: http://www.poedit.net/ + +[systemd-localed]: https://www.freedesktop.org/wiki/Software/systemd/localed/ + +[polkit]: https://www.freedesktop.org/wiki/Software/polkit/ + +[textdomain]: http://linux.die.net/man/3/textdomain diff --git a/content/designs/jenkins-docker.md b/content/designs/jenkins-docker.md new file mode 100644 index 0000000000000000000000000000000000000000..569bb9455cc281c8e0b8d368bfda7eb42c3cdd8d --- /dev/null +++ b/content/designs/jenkins-docker.md @@ -0,0 +1,201 @@ +--- +title: Jenkins and Docker +short-description: Standardizing on Docker as the environment for Jenkins jobs +authors: + - name: Emanuele Aina +--- + +# Jenkins and Docker + +This document provides a high-level overview of the reasons to adopt Docker for +the Jenkins jobs used by the Apertis infrastructure and covers the steps needed +to transition existing non-Docker jobs. + +## What is Jenkins + +Jenkins is the automation server that ties all the components of the Apertis +infrastructure together. + +It is responsible for: + +* building source packages from git repositories and submitting them to OBS +* building ospacks and images +* submitting test jobs to LAVA +* rendering documentation from Markdown to HTML and PDF and publishing it +* building sample app-bundles +* bundling test helpers + +## What is Docker + +Docker is the leading system to build, manage and run server applications in +a containerized environment. + +It simplifies reproducibility by: +* providing an easy way to build container images +* providing a registry for already built container images +* isolating the applications using the container images from the host system + +## Why Docker with Jenkins + +Running Jenkins jobs directly on a worker machine has several drawbacks: + +* all the jobs share the same work environment which can cause unwanted interactions +* the work environment has to be provisioned manually by installing packages on + the machine and hand-tweaking the configuration +* the work environment has to be kept up-to-date manually +* reproducing the same work environment on different workers is very error + prone as it relies on manual action +* customizing the work environment needs privileged operations +* the work environment can't be reproduced on developers' machines +* conflicting requirements (for instance, building against different releases) + cannot be fulfilled as the work environment is shared +* scaling is complex + +Jenkins jobs can be instead configured to use Docker containers as their +environment, which gives the advantages below: + +* each job runs in a separate container, giving more control about resource usage +* Docker containers are instantiated automatically by Jenkins +* rebuilding Docker containers from scratch to get the latest updates can be + done with a single click +* the containers provide a reproducible environment across workers +* Docker container images are built from `Dockerfiles` controlled by developers + using the normal review workflow with no special privileges +* the same container images used on the Jenkins workers can be used to + reproduce the work environment on developers' machines +* containers are ephemeral, a job changing the work environment does not affect + other jobs nor subsequent runs of the same job +* containers are isolated from each other and allow to address conflicting + requirements using different images +* several service providers offer Docker support which can be used for scaling + +## Apertis jobs using Docker + +Apertis already uses Docker containers for a few key jobs: in particular the +transition to Debos has been done by targeting Docker from the start, +greatly simplifying setup and maintenance compared to the previous mechanism. + +### Image recipes + +The [jobs building ospacks and images](https://gitlab.apertis.org/infrastructure/apertis-jenkins-jobs/tree/master/image-recipes) +use the [image-builder](https://gitlab.apertis.org/infrastructure/apertis-docker-images/tree/master/apertis-image-builder) +Docker container, based on Debian `stretch`. + +A special requirement for those jobs is that `/dev/kvm` must be made accessible +inside the container: particular care must then be taken for the worker +machines that will run these jobs, ruling out incompatible virtualization +mechanisms (for instance VirtualBox) and service providers that cannot provide +access to the KVM device. + +Developers can retrieve and launch the same environment used by Jenkins with a +single command: + +--- +$ docker run \ + -t docker-registry.apertis.org/apertis/apertis-18.09-image-builder \ + /bin/bash +--- + +### Documentation + +The [jobs building the designs and development websites](https://gitlab.apertis.org/infrastructure/apertis-jenkins-jobs/tree/master/hotdoc) +use the [documentation-builder](https://gitlab.apertis.org/infrastructure/apertis-docker-images/tree/master/apertis-documentation-builder) +Docker container, based on Debian `stretch`. + +Unlike the containers used to build images, the documentation builder does not +have any special requirement. + +### Docker images + +The Docker images used are generated and kept up-to-date +[through a dedicated Jenkins job](https://gitlab.apertis.org/infrastructure/apertis-jenkins-jobs/tree/master/docker-images) +that checks out the [docker-images](https://gitlab.apertis.org/infrastructure/apertis-docker-images/) +repository, uses Docker to build all the needed images and pushes them to +[our Docker registry](http://docker-registry.apertis.org/) to make them +available to Jenkins and to developers. + +## Converting the remaining jobs to Docker + +All the older jobs still run directly on a specifically configured worker +machine. By converting them to use Docker we would get the benefits listed +above and we would also be able to repurpose the special worker machine to +become another Docker host, doubling the number of jobs that can be run +in parallel. + +The affected jobs are: + +* [`packaging/*`](https://jenkins.apertis.org/job/apertis-18.09/job/packaging/) +* [`packages/*`](https://jenkins.apertis.org/job/apertis-18.09/job/packages/) +* [`samples/*`](https://jenkins.apertis.org/job/apertis-18.09/job/samples/) +* [`apertis-check-commit`](https://jenkins.apertis.org/job/phabricator/job/apertis-check-commit/) +* [`apertis-master-build-snapshot`](https://jenkins.apertis.org/job/phabricator/job/apertis-master-build-snapshot/) +* [`apertis-build-package-all-masters`](https://jenkins.apertis.org/job/apertis-build-package-all-masters/) + +### Creating a new Docker image + +The first step is to create a new Docker image to reproduce the work +environment needed by the jobs. + +A new [package-builder](https://gitlab.apertis.org/infrastructure/apertis-docker-images/tree/master/apertis-package-builder) +recipe is introduced. + +Unlike other images so far, this one is based on Apertis itself rather than +Debian. This means that a minimal Apertis ospack is produced during the build +and is then used to seed the `Dockerfile`, which installs all the needed +packages on top of it. + +### Converting the packaging jobs + +All the `packages/*` and `packaging/*` jobs are similar as they involve +checking out a git tree for a package, launching `build-snapshot` to build them +against the work environment and submit the resulting source package to OBS. + +Once all the dependencies have been made available in the work environment again, +[converting the job templates](https://gitlab.apertis.org/infrastructure/apertis-jenkins-jobs/commit/59af079a3b5) +only requires minor changes. + +### Converting the sample app-bundle jobs + +The jobs building the sample applications need `ade` and the dependencies of the app-bundles themselves. + +The changes required to +[switch the job template to use Docker](https://gitlab.apertis.org/infrastructure/apertis-jenkins-jobs/commit/c9562929b04) +are pretty similar to the ones required by the packaging jobs. + +### Converting the build-package-all-masters job + +This job's purpose is to check that no API breakage is introduced in +the application framework and HMI packages by building them from sources +in sequence. + +The changes required to +[switch the job template to use Docker](https://gitlab.apertis.org/infrastructure/apertis-jenkins-jobs/commit/c9582cd56b8) +are pretty similar to the ones required by the packaging jobs. + +### Converting the Phabricator jobs + +While the plan is to officially switch to GitLab for all the code reviews, the +jobs used to validate the patches submitted to Phabricator need to be ported to +avoid regressions. + +The changes [to port them to Docker](https://gitlab.apertis.org/infrastructure/apertis-jenkins-jobs/commit/60eb971fc0c) +are similar to the ones for the other jobs, but +additional fixes are needed to ensure they work smoothly in ephemeral Docker containers, +[relaxing ssh host keys checking](https://gitlab.apertis.org/infrastructure/apertis-jenkins-jobs/commit/3213294757f) and +[avoiding the interactive behavior of git-phab](https://gitlab.apertis.org/infrastructure/apertis-jenkins-jobs/commit/d9e932d1fd1). + + +## Steps to be taken by downstreams + +Downstreams are already likely to have a Docker-capable worker machine for their +Jenkins instance in order to run the Debos-based jobs. + +By merging the latest changes in the +[apertis-docker-images](https://gitlab.apertis.org/infrastructure/apertis-docker-images/) +repository a new `package-builder` image should be available in their +Docker registry. + +The updates to the templates in the +[apertis-jenkins-jobs](https://gitlab.apertis.org/infrastructure/apertis-jenkins-jobs/) +repository can then be merged and deployed to Jenkins to make use of the new +Docker image. diff --git a/content/designs/lava-external-devices.md b/content/designs/lava-external-devices.md new file mode 100644 index 0000000000000000000000000000000000000000..efc461d904e8d014a0903b91e95838e31524a2d1 --- /dev/null +++ b/content/designs/lava-external-devices.md @@ -0,0 +1,407 @@ +--- +title: LAVA External Device Monitoring +short-description: LAVA test monitoring for external devices +authors: + - name: Luis Araujo +--- + +# LAVA External Device Monitoring + +This document describes how to execute automated LAVA tests controlling resources +external to the DUT across a network implementing a LAVA parallel pipeline job. + +# Test Cases + +The approach proposed in this document will help to address test cases like: + + - Executing a test in the DUT where certain power states are simulated (for + example a power loss) during specific test actions using a programmable PSU + external to the DUT. + + - Executing a test in the DUT simulating SD card insertion and removal using + an external device. + +The only assumption, in both scenario, proposed in this document is that the +external device (either a programmable PSU or SD-card simulator) can be accessed +through the network using SSH. + +# LAVA Features + +LAVA offers the following features that can be combined to implement a solution +for the test cases mentioned in this document: + + - LXC to deploy required software and tools to access the external device. + - MultiNode to communicate data between jobs actions. + - Secondary connections for executing tests through SSH. + +## LXC + +LAVA supports LXC containers both as a standalone device type and as dynamic +transparent environments in order to interact with external devices. In either +case the [LXC Protocol] is used. + +## MultiNode + +The [MultiNode Protocol] allows data to be shared between actions, including data +generated in one test shell definition being made available over the protocol to +a deploy or boot action of jobs with a different role. + +Synchronisation is done using the MultiNode API, specifically the `lava-send` and +`lava-wait` calls. + +## Secondary Connections + +LAVA allows [Secondary Connections] to open network connections to external +devices using MultiNode submissions. + +# Approach Overview + +The main idea is to create a LXC container device associated to the DUT responsible +to execute the automated test, then opens a SSH connection to an external device, +and use the MultiNode API in order to synchronize both devices and pass data +between them with the LXC container serving like a coordinator of the different +LAVA tests actions. + +In this way, a server-client layout is setup that will help to execute tests in a +board attached to LAVA (server side) with intervention of external devices (client +side). + +# LAVA Job Connection Layout + +The LXC container is deployed directly from the LAVA dispatcher and coordinate +the execution of the parallel pipeline between the DUT and the external device +(secondary connection) from there. + +The layout model would be something like: + + ------------- DUT + / MultiNode + LAVA (LXC) + \ + ------------- Secondary Connection (PSU, SD-Card HW) + MultiNode + +## Test Job + +This section shows the basics proposed in this document using a LAVA job file +example. + +The following steps describe the main flow of the job: + +1 - Create two types of roles `host` and `guest`. The `host` role will contain the + LXC container and the DUT, the `guest` role will label the SSH connection for + the external device. This creates two groups (`host` and `guest`) that can + communicate using the MultiNode API, so messages can be sent between the LXC + and Device as the server and the secondary connection as the client. + +2 - Label both types of roles in the `protocols` section of the job. + +3 - Deploy and boot the `LXC` container (`host`). + +4 - Execute a test in the LXC container using the MultiNode API to send the + `lava_start` message, so the `deploy` action for the external device can start, + and waits for remaining clients to start using the `lava-sync` call. + +5 - Deploy the DUT (`host`). + +6 - Deploy the external device (`guest`) , which is waiting for the LXC + `lava_start` message to start deployment. Once this message is recevied, the + guest device is deployed. + +7 - Boot DUT. + +8 - Boot external device. + +9 - Execute a test in the DUT sending the `lava-sync` call. + +10 - Execute a test in the external device sending the `lava-sync` call. + +11 - Once all clients are synchronized (the LXC, DUT and external device), start + executing tests. + +12 - Tests executed in the DUT and external device needs to use the [MultiNodeAPI] + in order to pass data between them. + +As the LXC is deployed and booted first, the LXC can run a test shell before +deploying the device, before booting the device, before the test shell action on +the device which starts the secondary connection guests or at any later point +([AddingTestsActions]). + +### Job File Example + +--- +job_name: LXC and Secondary connection with a Device + +timeouts: + job: + minutes: 30 + action: + minutes: 3 + connection: + minutes: 5 +priority: medium +visibility: public + +protocols: + lava-lxc: + host: + name: lxc-ssh-test + template: debian + distribution: debian + release: stretch + lava-multinode: + # expect_role is used by the dispatcher and is part of delay_start + # host_role is used by the scheduler, unrelated to delay_start. + roles: + host: + device_type: beaglebone-black + # This makes this role essential in order to execute the test. + essential: True + count: 1 + timeout: + minutes: 10 + guest: + # protocol API call to make during protocol setup + request: lava-start + # set the role for which this role will wait + expect_role: host + timeout: + minutes: 15 + # no device_type, just a connection + connection: ssh + count: 3 + # each ssh connection will attempt to connect to the device of role 'host' + host_role: host + +actions: +- deploy: + role: + - host + namespace: probe + timeout: + minutes: 5 + to: lxc + # authorize for ssh adds the ssh public key to authorized_keys + authorize: ssh + packages: + - usbutils + - procps + - lsb-release + - util-linux + - ntpdate + - openssh-server + - net-tools + +- boot: + role: + - host + namespace: probe + prompts: + - 'root@(.*):/#' + timeout: + minutes: 5 + method: lxc + +- test: + role: + - host + namespace: probe + timeout: + minutes: 5 + definitions: + - repository: + metadata: + format: Lava-Test Test Definition 1.0 + name: network + description: "Send message ID" + run: + steps: + - lava-test-case ntpdate --shell ntpdate-debian + - lava-echo-ipv4 eth0 + - lava-send ipv4 ipaddr=$(lava-echo-ipv4 eth0) + - lava-send lava_start + - lava-sync clients + from: inline + name: lxc-test + path: inline/lxc-test.yaml + +# DUT actions +- deploy: + role: + - host + namespace: device + timeout: + minutes: 5 + to: tftp + + kernel: + url: https://files.lavasoftware.org/components/lava/standard/debian/stretch/armhf/3/vmlinuz-4.9.0-4-armmp + sha256sum: b6043cc5a07e2cead3f7f098018e7706ea7840eece2a456ba5fcfaddaf98a21e + type: zimage + ramdisk: + url: https://files.lavasoftware.org/components/lava/standard/debian/stretch/armhf/3/initrd.img-4.9.0-4-armmp + sha256sum: 4cc25f499ae74e72b5d74c9c5e65e143de8c2e3b019f5d1781abbf519479b843 + compression: gz + modules: + url: https://files.lavasoftware.org/components/lava/standard/debian/stretch/armhf/3/modules.tar.gz + sha256sum: 10e6930e9282dd44905cfd3f3a2d5a5058a1d400374afb2619412554e1067d58 + compression: gz + nfsrootfs: + url: https://files.lavasoftware.org/components/lava/standard/debian/stretch/armhf/3/stretch-armhf-nfs.tar.gz + sha256sum: 46d18f339ac973359e8ac507e5258b620709add94cf5e09a858d936ace38f698 + compression: gz + dtb: + url: https://files.lavasoftware.org/components/lava/standard/debian/stretch/armhf/3/dtbs/am335x-boneblack.dtb + sha256sum: c4c461712bf52af7d020e78678e20fc946f1d9b9552ef26fd07ae85c5373ece9 + +- deploy: + role: + - guest + namespace: guest + # Timeout for the ssh connection attempt + timeout: + seconds: 30 + to: ssh + connection: ssh + protocols: + lava-multinode: + - action: prepare-scp-overlay + request: lava-wait + messageID: ipv4 + message: + ipaddr: $ipaddr + timeout: # delay_start timeout + minutes: 5 + +- boot: + role: + - host + namespace: device + timeout: + minutes: 15 + method: u-boot + commands: nfs + auto_login: + login_prompt: 'login:' + username: root + prompts: + - 'root@stretch:' + parameters: + shutdown-message: "reboot: Restarting system" + +- boot: + role: + - guest + namespace: guest + timeout: + minutes: 3 + prompts: + - 'root@stretch:' + parameters: + hostID: ipv4 + host_key: ipaddr + method: ssh + connection: ssh + +- test: + role: + - host + namespace: device + timeout: + minutes: 30 + definitions: + - repository: + metadata: + format: Lava-Test Test Definition 1.0 + name: install-ssh + description: "install step" + run: + steps: + - df -h + - free + - lava-sync clients + from: inline + name: ssh-inline + path: inline/ssh-install.yaml + - repository: http://git.linaro.org/lava-team/lava-functional-tests.git + from: git + path: lava-test-shell/smoke-tests-basic.yaml + name: smoke-tests + - repository: http://git.linaro.org/lava-team/lava-functional-tests.git + from: git + path: lava-test-shell/single-node/singlenode02.yaml + name: singlenode-intermediate + +- test: + role: + - guest + namespace: guest + timeout: + minutes: 5 + definitions: + - repository: http://git.linaro.org/lava-team/lava-functional-tests.git + from: git + path: lava-test-shell/smoke-tests-basic.yaml + name: smoke-tests + # run the inline last as the host is waiting for this final sync. + - repository: + metadata: + format: Lava-Test Test Definition 1.0 + name: client-ssh + description: "client complete" + run: + steps: + - df -h + - free + - lava-sync clients + from: inline + name: ssh-client + path: inline/ssh-client.yaml + +# +# Tests executed in the external device and DUT can be added here. +# They all need to use the MultiNode API. +# + +# Execute test in the DUT +- test: + role: + - host + namespace: device + timeout: + minutes: 10 + definitions: + - repository: https://gitlab.apertis.org/tests/apertis-test-cases/ + from: git + path: lava-test-shell/single-node/singlenode03.yaml + name: singlenode-advanced + +# Execute test in the external device (PSU, SD-card device) +- test: + role: + - guest + namespace: guest + timeout: + minutes: 10 + definitions: + - repository: https://gitlab.apertis.org/tests/apertis-test-cases/ + from: git + path: lava-test-shell/single-node/singlenode03.yaml + name: singlenode-advanced +--- + +# QA Report + +Once tests results are available at LAVA , and the test cases are enabled for the +specific images from the test case repository, the results will be available from +the QA Report App automatically. + +[MultiNode Protocol]: https://lava.collabora.co.uk/static/docs/v2/actions-protocols.html#multinode-protocol + +[MultiNodeAPI]: https://lava.collabora.co.uk/static/docs/v2/multinodeapi.html#multinode-api + +[LXC Protocol]: https://lava.collabora.co.uk/static/docs/v2/actions-protocols.html#lxc-protocol-reference + +[Secondary Connections]: https://lava.collabora.co.uk/static/docs/v2/pipeline-writer-secondary.html + +[AddingTestsActions]: https://lava.collabora.co.uk/static/docs/v2/writing-multinode.html#adding-test-actions diff --git a/content/designs/license-applying.md b/content/designs/license-applying.md new file mode 100644 index 0000000000000000000000000000000000000000..c7cb8fdb5fe42ba6d2dbf3f69dcef3787371218e --- /dev/null +++ b/content/designs/license-applying.md @@ -0,0 +1,256 @@ +# Applying Licensing + +Apertis code, including build scripts, helpers and recipes, is licensed under the +[Mozilla Public License Version 2.0](https://www.mozilla.org/en-US/MPL/2.0/). +Images (such as icons) and documentation in Apertis are licensed under the +[Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/) +(CC BY-SA 4.0) license. + +When you contribute to any Apertis code repository, you are agreeing to license +your work under the same license as the rest of the code in the repository. + +If you are +[creating a new Apertis project](https://wiki.apertis.org/Guidelines/Module_setup), +the code must be licensed under the MPL 2.0, unless there’s a good reason for it +to be licensed differently. + +Apertis also makes use of other projects which may have other licenses, such as +the [GPL and LGPL](https://www.gnu.org/licenses/licenses.html). For example, +this includes projects such as the Linux kernel, WebKit and GLib. + + +## Licensing of code + +There are two parts to licensing a project: + +* distribute the license text +* include license headers in each file + +### Distribute the license file + +The license text is normally distributed in the `COPYING.MPL` or `COPYING` file +which lives in the top directory in the git repository for the project. This +file will contain the full license text, as listed at +[https://www.mozilla.org/media/MPL/2.0/index.815ca599c9df.txt], without any +modifications or changes. For example, see the +[newport COPYING file](https://gitlab.apertis.org/appfw/newport/blob/apertis/v2019/COPYING). + +While `COPYING` is a more common filename to use, `COPYING.MPL` accounts for +the case where there may be files in the project under a different license +which would require multiple `COPYING.*` files to be included. This case is +most common with applications which may include content such as logos, images +and documentation under different licenses. + + +#### Distributing portions under different licenses + +It is very common to see only one `COPYING` file in a project which contains +only a single license text, and it is also common to see the images and +documentation shipped either under a license which is best suited for code +(that is to say, impossible for images and documentation to comply with) or +without proper licensing. + +Licensing all parts of your project appropriately is not complicated and we +highly recommend that you do so. Your typical directory structure should look +something like: + +--- +<project> +↳COPYING +↳COPYING.MPL +--- + +The `COPYING` file should contain information about all parts of the project. +For example, it could look like: + +--- +<project> is an Apertis project and follows the licensing guidelines as +specified at https://wiki.apertis.org/Licensing. + +Code +---- +All code in this project is licensed under the Mozilla Public License Version +2.0. See COPYING.MPL for the full license text. + +Images +------ +All icons and other images in this project are licensed under CC BY-SA 4.0 +International. For information about this license, see +https://creativecommons.org/licenses/by-sa/4.0/ + +Documentation +------------- +All documentation in this project is licensed under CC BY-SA 4.0 International. +For information about the license, see +https://creativecommons.org/licenses/by-sa/4.0/ +--- + +Your `COPYING.MPL` should contain the full license text for the Mozilla Public +License Version 2.0. You may also need to have other license-specific `COPYING` +files, depending on your project. + +In this case, we include a `COPYING.MPL` to comply with the MPL 2.0 as it +requires the full license text to be included in your project, but we do not +have a `COPYING.CC-BY-SA` because the CC BY-SA 4.0 license does not require the +license text to be distributed (but you may include it if you wish to do so). + + +### Add license headers to each file + +A `license header` is a comment which is added to the top of a code file. It +consists of a `copyright notice`, the SPDX license identifier and a license +blurb which is provided with the license. The license header for a specific +file must contain only copyright holders of content which is in that file. This +means that the license header in each of your project files are likely to list +different copyright holders. + +The copyright notice will normally contain `Copyright ©` followed by the +copyright years and the copyright holder. It is recommended that you also +include a contact email address for the copyright holder, although this is +optional. + +If you are employed to contribute to Apertis, the copyright holder may be +either you or your employer. We recommend that you check with your employer +before you contribute as it may not be possible to completely remove any +mistakes as the code is publicly available and archived. + +This is what a typical MPL license header looks like: + +--- +/* + * Copyright © 2015, 2016 Anita Developer <a.developer@example.com> + * + * SPDX-License-Identifier: MPL-2.0 + * This Source Code Form is subject to the terms of the Mozilla Public + * License, v. 2.0. If a copy of the MPL was not distributed with this + * file, You can obtain one at http://mozilla.org/MPL/2.0/. +--- +--- + +For additional guidance on how license headers work, please read the +[GNU license guidance](https://www.gnu.org/licenses/gpl-howto.html). The +theory of using the MPL license headers is the same as for the GPL, but do keep +in mind that the GPL/GNU licenses have different content from the MPL license. + + +#### Copyright notice date range + +The copyright notice should always correspond to the year that the work was +done in. + +For example, + +* work done in 2015, should have `© 2015` +* work done in 2016 should have `© 2016` +* files which had work done in 2015 and 2016 should have `© 2015–2016` +* files which had work done in 2014 and 2016 should have `© 2014, 2016` +* files which had work done in 2013, 2015 and 2016 should have `© 2013, 2015–2016` + +Your copyright notice should normally look something like: + +--- +Copyright © 2016 Anita Developer <a.developer@example.com> +--- + +For documentation written in Mallard, you should use the `<credit>`, `<name>`, +`<email>` and `<years>` tags which will generate the correct copyright notice +automatically. + +The copyright holder will normally be you or, if you make the contribution as +part of paid work, then your employer. If you are unsure about this, you should +check what your employment contract states on the matter or seek further legal +advice. + +You must not amend copyright notices which are inserted by other people without +their explicit permission which must be recorded appropriately. + +Apart from the license header, you should also include the +[vim modeline at the top of the file](https://designs.apertis.org/latest/coding_conventions.html#code-formatting) +to help enforce consistent coding style. + + +### Add a new code file to a project + +Each code file in all Apertis repositories must contain the license header. +This license header must be added in the commit when the file is first added to +the project and will typically contain your copyright notice. + +Always double check the project license before adding a license header: not all +projects are licensed under the MPL! You can find the project license in the +`COPYING` or `COPYING.*` files. This is most likely to be the case for +repositories which are upstream projects that have Apertis specific +customisations applied to them. If unsure, do ask the project maintainer for +help. You can find the list of maintainers in the .doap file in the project git +repository. + + +### Make changes to an existing code file + +When you make a copyrightable change to a file in an existing project, you will +need to add your copyright notice to the existing copyright header, but make +sure that you do not amend or change the license notice in any way! Add your +notice below the existing copyright notices, but above the license notice. + +For example, if your copyright notice was +`Copyright © 2016 Andrew Contributor <a.contributor@example.com>` then the +resulting copyright header would look like: + +--- +/* + * Copyright © 2015, 2016 Anita Developer <a.developer@example.com> + * Copyright © 2016 Andrew Contributor <a.contributor@example.com> + * + * SPDX-License-Identifier: MPL-2.0 + * This Source Code Form is subject to the terms of the Mozilla Public + * License, v. 2.0. If a copy of the MPL was not distributed with this + * file, You can obtain one at http://mozilla.org/MPL/2.0/. +--- +--- + +## License for images + +As with code, there are two parts to licensing your images: + +* include mention of the image licensing in the COPYING (recommended) or README + file as covered in `Distributing portions under different licenses` +* add the license to the image metadata in case it becomes separated from the + repository + +### Add the license to the metadata + +You can use `exiv2`, which is a command-line tool, to write Exif metadata into +the file. `exiv2` should be available through your Linux distribution or you +can [download](http://www.exiv2.org/download.html) it for Linux or Windows from +its website. + +For example, if your copyright notice is +`© 2016 Alice Artist <a.artist@example.com>` then this command will add it to +the `Exif.Image.Copyright` key: +--- +exiv2 -v -M"set Exif.Image.Copyright Copyright © 2016 Alice Artist <a.artist@example.com>. This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA." <path to your image> +--- + +Replace the copyright notice with your own and replace `<path to your image>` +with the path to the image which you want to update. + +You can now check the copyright notice with: +--- +exiv2 <path to your image> +--- + +Which will output something that looks like: +--- +File name : apertis-icon.png +File size : 1228 Bytes +MIME type : image/png +Image size : 36 x 36 +Thumbnail : None +Copyright : Copyright © 2016 Alice Artist <a.artist@example.com>. This wor +k is licensed under the Creative Commons Attribution-ShareAlike 4.0 Internationa +l License. To view a copy of this license, visit <nowiki>http://creativecommons.org/lice</nowiki> +nses/by-sa/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View +, CA 94042, USA. +Exif comment : +--- +There may be some other tags present in the output. + diff --git a/content/designs/license-exceptions.md b/content/designs/license-exceptions.md new file mode 100644 index 0000000000000000000000000000000000000000..aff3a3fd97d570365de31f5c92db130810baba6a --- /dev/null +++ b/content/designs/license-exceptions.md @@ -0,0 +1,202 @@ +--- +title: Apertis License exceptions +short-description: Document license exceptions for projects in Apertis +authors: + - name: Andrej Shadura + - name: Emanuele Aina + - name: Frederic Dalleau + - name: Sjoerd Simons +--- + +# License exceptions + +License exceptions for Apertis are listed below. +Each exception must provide the following informations: + +<table> +<colgroup> + <col style="width: 20%" /> + <col style="width: 80%" /> +</colgroup> +<tr> + <th>project</th> + <td>The project name</td> +</tr> +<tr> + <th>component</th> + <td>The repository components apertis:*:target</td> +</tr> +<tr> + <th>date</th> + <td>The date at which the exception was added to this document</td> +</tr> +<tr> + <th>validator</th> + <td>The name of the person who validated the exception</td> +</tr> +<tr> + <th>rule</th> + <td>The rules that are ignored by this exception</td> +</tr> +<tr> + <th>reason</th> + <td>A description of why the exception is granted and makes sense</td> +</tr> +</table> + +## gcc-8 + +<table> +<colgroup> + <col style="width: 20%" /> + <col style="width: 80%" /> +</colgroup> +<tr> + <th>project</th> + <td>gcc-8</td> +</tr> +<tr> + <th>component</th> + <td>apertis:*:target</td> +</tr> +<tr> + <th>date</th> + <td>April 17, 2019</td> +</tr> +<tr> + <th>validator</th> + <td>fredo</td> +</tr> +<tr> + <th>rule</th> + <td>No GPL v3</td> +</tr> +<tr> + <th>reason</th> + <td> + <p>The GCC source package is granted exception to be present in target repository component + because it produces binary packages covered by different licensing terms:</p> + <ul> + <li>the compiler packages are released under the GPL-3</li> + <li> the <code>libgcc</code> runtime library is covered by the + <a href="https://www.gnu.org/licenses/gcc-exception-3.1-faq.html">GCC Runtime Library Exceptions</a> + </li> + </ul> + <p>Programs compiled with GCC link to the <code>libgcc</code> library to implement some compiler intrinsics, + which means that the <code>libgcc</code> must live in the <code>apertis:*:target</code> component + since it is a direct runtime dependency of packages in the same component.</p> + <p>For this reason, an exception is granted to the <code>gcc</code> source package + on the ground that:</p> + <ul> + <li>code that is shipped on target devices (that is, <code>libgcc</code>) is covered by the + <a href="https://www.gnu.org/licenses/gcc-exception-3.1-faq.html">GCC Runtime Library Exceptions</a> + </li> + <li>the pure GPL-3 code is not meant to be shipped in target devices</li> + </ul> + </td> +</tr> +</table> + +## libtool + +<table> +<colgroup> + <col style="width: 20%" /> + <col style="width: 80%" /> +</colgroup> +<tr> + <th>project</th> + <td>libtool</td> +</tr> +<tr> + <th>component</th> + <td>apertis:*:target</td> +</tr> +<tr> + <th>date</th> + <td>August 05, 2019</td> +</tr> +<tr> + <th>validator</th> + <td>ritesh</td> +</tr> +<tr> + <th>rule</th> + <td>No GPL v3</td> +</tr> +<tr> + <th>reason</th> + <td> + libtool is granted exception to be present in target repository component<br/> + because all the source files are licensed under the GPLv2 with the exception<br/> + of build files, which are licensed under GPLv3.<br/> + These build files are used only to build the binary package and are not<br/> + GPLv3 violations for the built binary packages.<br/> + </td> +</tr> +</table> + +## elfutils + +<table> +<colgroup> + <col style="width: 20%" /> + <col style="width: 80%" /> +</colgroup> +<tr> + <th>project</th> + <td>elfutils</td> +</tr> +<tr> + <th>component</th> + <td>apertis:*:target</td> +</tr> +<tr> + <th>date</th> + <td>September 17, 2019</td> +</tr> +<tr> + <th>validator</th> + <td>andrewsh</td> +</tr> +<tr> + <th>rule</th> + <td>No GPL v3</td> +</tr> +<tr> + <th>reason</th> + <td> + <p><code>elfutils</code> is software dual-licensed as LGPL-3+ or GPL-2+, which + means that any combined work using it has to be shipped under terms + compatible with either of those two licenses. To avoid the effects of the + GPL-3 provisions as required for the <code>target</code> repository, any + combined work depending on any of the libraries provided by + <code>elfutils</code> must be effectively licensed under the GPL-2 terms. + </p> + <p> + The following binary packages are linking against <code>elfutils</code> in + way that no GPL-3 restrictions need to be applied as they only ship + executables that produce combined works under the GPL-2: + </p> + <ul> + <li><code>linux-perf-4.19</code>: GPL-2, does not ship libraries, + development tool not meant to be shipped on products + <li><code>linux-kbuild-4.19</code>: GPL-2, does not ship libraries, + development tool not meant to be shipped on products + <li><code>bluez</code>: GPL-2, does not ship libraries + <li><code>libglib2.0-bin</code>: LGPL-2.1, effectively combined to GPL-2, + does not ship libraries + </ul> + <p> + In addition, the <code>mesa</code> source package produces binary packages + containing drivers that need to be linked to <code>libelf</code> and, in + turn, get linked to graphical applications. This would impose LGPL-3+ + restrictions on <code>libelf</code> unless the application and all the other + linked libraries can be combined as a GPL-2 work. This is not an acceptable + restriction, so the affected drivers have been disabled, and no binary + package produced from the <code>mesa</code> source package links to any + library shipped by <code>elfutils</code>. + </p> + </td> +</tr> +</table> diff --git a/content/designs/license-expectations.md b/content/designs/license-expectations.md new file mode 100644 index 0000000000000000000000000000000000000000..5a84cb421e0721ec0d703f6d5c6501fba6727a10 --- /dev/null +++ b/content/designs/license-expectations.md @@ -0,0 +1,302 @@ +--- +title: Open source License expectations +short-description: Document license obligations for projects in Apertis +authors: + - name: Emanuele Aina + - name: Frederic Dalleau + - name: Sjoerd Simons + - name: Peter Senna Tschudin +--- + +# License expectations + +## Introduction + +The license is an important element in open source projects as the license +define acceptable use cases, user rights, and contribution guidelines. There +are different ways to identify the license from the project source code such +as SPDX headers, the LICENSE file, and the COPYING file. However an open source +project may contain files from other projects and may use different licenses +for different files. + +### Apertis goals + +Apertis aims to accomplish the following goals: + +- Ensure that all the software shipped in Apertis is open source or at least freely +distributable, so that downstreams are entitled to +use, modify and redistribute work derived from our deliverables. +- Ensure that Apertis images targeting devices (such as target and minimal), +are not subject to licensing constraints that may conflict with the regulatory +requirements of some intended use cases. + +In order to reach these goals, the below assumptions are made: +- *licenses declared by open source projects are correct:* The software + authors correctly document the licensing of their released software sources and + that they have all the rights to distribute it under the documented terms. +- *licenses verified by the Debian project are correct:* The package + distributors (that is, Debian maintainers and the FTP Masters team) check that + the licensing terms provided by the software authors are open source using the + definitions in the [Debian Free Software Guidelines][DFSG] and ensure those + terms are documented in a canonical location (debian/copyright in the package + sources). + +### Licensing constraints + +The version 3 of the GPL license was created to address the concern of users +who were prevented from running modified code on their device, when the +device was shipped with OSS. +A common method for preventing users to run their +own code is by using signature verification. This practice is known as +[Tivoization](https://en.wikipedia.org/wiki/Tivoization). +Those licensing rules are a +constraint because in some application domains, it is a regulatory (or safety) +requirement to ensure that the hardware runs verified software. + +## Ensuring continuous maintenance of open source licence documentation + +Maintaining the open source licenses documentation is an incremental process: + +At the time of rebase, licenses are checked manually for all packages involved +in the rebase. This covers the whole archive. + +During the development, updates are monitored. The integration of a new +project in Apertis and the update of source code are the operations that can +result in the update of a license. New projects can be integrated at any time +in Apertis. If new sources for a project already in Apertis are received: the +license of the project can change, or the licensing for some distributables +within this project can differ from the prevalent license. + +From a project perspective, Apertis teams tries to do a full scan on all +projects at each release cycle. + +Open source software shipped with devices that users buy adds significant +licensing constraints to the software stack of preview and product releases. +These constraints do not affect development releases, and it is possible to +save some work on those releases. + +In an ideal situation, regular checks of the whole archive would be automated to +ensure nothing escaped the manual checks. While the Apertis maintainers are +already manually checking packages, the automated whole-archive checks are not +currently implemented. [Future improvements] presents a possible solution. + +## Apertis Licensing expectations + +### General rules of the Apertis project and their specific constraints + +The [Debian Free Software Guidelines][DFSG] defines expectations for the +licenses of the projects that are integrated in Debian. They serve as a base +for Apertis policy. The DFSG can be read in the [Appendix] section of this +document. + +On top of the DFSG expectations, Apertis defines additional rules for specific sections of its +package repository which are described in [Apertis specific +rules]. In particular, the sections in the Apertis package repository are meant +to group the packages that are installed on images for target devices and should +thus be free of [licensing constraints]. + +Debian packages in a repository are organized in components. +A component is a group of packages sharing a common policy. +A single image can incorporate packages from different components. + +### Apertis Repository component specific rules + +The canonical source of Licensing information is this document. Each +repository is listed here, with the rules that apply. + +Each component contains several source packages, and each source package can +generate multiple binary packages. For example, in a client server project, it's +possible for a source package to generate two binary packages: one for the +server side of a project, and one for the client side. Each binary package can +have a different license. + +For current apertis releases, the following components exist: +- target: contains packages for the final devices, +- hmi: contains user interfaces packages, +- sdk: contains packages specific to SDK +- development: contains packages useful for developers + +The license expectations for each of those components are defined below. +Any package outside these expectations should be documented. + +#### target + +This component ships source packages producing binary packages used in images +deployable on target devices. For a file in a binary package to be considered +an artifact, the file must have been generated/compiled/translated from a +source package. An artifact can be an executable, a library, or any other file +that is subject to a license. Specifically, the binary packages installed on +those images should not be affected by licensing constraints. This does not +mean that every source or binary package in the component must be completely +unrestricted: +* source packages may contain restricted build scripts, provided that the license + does not affect generated artifacts +* source packages may contain restricted tests or utilities, provided that they + are not shipped in the same package as the unrestricted artifacts installed on + target images +* binary packages may contain restricted artifacts, provided that they are built + from a source package also producing unrestricted packages that are shipped + on target images +* binary packages may contain restricted artifacts with added + exceptions. + The [GCC Runtime Library Exception](https://www.gnu.org/licenses/gcc-exception-3.1-faq.html) + covering `libgcc` is the main example. + Those exceptions should be documented. + +#### hmi + +This component has the same usage and constraints as the target component. + +#### sdk + +This component ships source packages producing binary packages suitable for +images deployable on SDK images. Since the packages hosted in this component +are only meant for development purposes, no further requirement is imposed +other than the DFSG ones. + +#### development + +This component provides the packages needed to build the packages in the +`target` repository component but that are not meant to be installed on target +devices. Build tools like GNU binutils, the GNU Autotools, or Meson are hosted +in this component. + +Dependencies of packages in the target component that are not meant to be +installed on target images are also hosted in this component. For instance, +many source package in the target component also build a binary package +containing their tests which are not intended to be part of the target images: +the extra dependencies required by the test package but not by the main package +are hosted in the development component. + +The development component also host development tools that are not part of the +target images by default, but that may be useful to install manually on target +devices during development. Tools like `strace`, `tcpdump` or `bash` belong to +this category. + +Since those packages are exclusively intended for a development purpose within +the Apertis development team no further requirement is imposed other than the +DFSG ones. + +## Auditing the license of a project + +Auditing the license of an imported package depends of the type of the project. + +For debian packages, the Debian licensing informations gives a good indication if a project can be +integrated in Apertis. Debian maintainers take extreme precaution to ensure that +what they redistribute is redistributable. Using the Debian licensing +informations provides many benefits: +- vetting licensing terms to ensure they are open source (in particular, as +defined in the DFSG) +- ensuring that non DFSG-compliant items are excluded from the source code +- a standardized location for the licensing information (that is, +`debian/copyright` in the package source) +- an ongoing effort to make the provided licensing information machine-readable +(DEP-5) + +Some projects may not be packaged by Debian. In this case, the project source +code should contain a document stating the license. Any project that do not +provide license information should not be redistributed. + +## Documenting exceptions + +For Apertis, the list of exceptions should mention: +- The project location in Apertis mainly gitlab or OBS. +- The project source package name +- The project component +- The rule the project does not meet that requires the exception +- The reason behind the exception +- The date at which the exception was made +- The name of the person who validated the exception + +The canonical source of Licensing exceptions is the [license exceptions] +document. + +Apertis derived projects should provide an equivalent location for their +specific exceptions. + +## Future improvements + +Manually checking licenses will not scale and may not be done in a deterministic +way. Introducing automation is a key. + +FOSSology is a license reporting tool. It is described in the [License validation] +document. Although we trust the developer to check license. The use of FOSSology could +help ensure correct identification. + +An Apertis-specific FOSSology instance could be setup to scan on each commit. +Upload of packages sources to GitLab could trigger a FOSSology scan in order to +scan them before sources reaches the OBS repositories. The CI could then +integrate the DEP-5 machine readable copyright information from FOSSology +into the built package. This information could then be extracted and a +bill-of-materials list generated for each artifact (images, container +images, etc.). + +## Appendix + +### The Debian Free Software Guidelines (DFSG) + +1. Free Redistribution + +The license of a Debian component may not restrict any +party from selling or giving away the software as a component of an aggregate +software distribution containing programs from several different sources. The +license may not require a royalty or other fee for such sale. + +2. Source Code + +The program must include source code, and must allow distribution in source +code as well as compiled form. + +3. Derived Works + +The license must allow modifications and derived works, and must allow them to +be distributed under the same terms as the license of the original software. + +4. Integrity of The Author's Source Code + +The license may restrict source-code from being distributed in modified form +only if the license allows the distribution of "patch files" with the source +code for the purpose of modifying the program at build time. The license must +explicitly permit distribution of software built from modified source code. The +license may require derived works to carry a different name or version number +from the original software. (This is a compromise. The Debian group encourages +all authors not to restrict any files, source or binary, from being modified.) + +5. No Discrimination Against Persons or Groups + +The license must not discriminate against any person or group of persons. + +6. No Discrimination Against Fields of Endeavor + +The license must not restrict anyone from making use of the program in a +specific field of endeavor. For example, it may not restrict the program from +being used in a business, or from being used for genetic research. + +7. Distribution of License + +The rights attached to the program must apply to all to whom the program is +redistributed without the need for execution of an additional license by those +parties. + +8. License Must Not Be Specific to Debian + +The rights attached to the program must not depend on the program's being part +of a Debian system. If the program is extracted from Debian and used or +distributed without Debian but otherwise within the terms of the program's +license, all parties to whom the program is redistributed should have the same +rights as those that are granted in conjunction with the Debian system. + +9. License Must Not Contaminate Other Software + +The license must not place restrictions on other software that is distributed +along with the licensed software. For example, the license must not insist that +all other programs distributed on the same medium must be free software. + +10. Example Licenses + +The "GPL", "BSD", and "Artistic" licenses are examples of licenses that we +consider "free". + +[DFSG]: https://www.debian.org/social_contract#guidelines +[License validation]: license-validation.md +[license exceptions]: https://designs.apertis.org/latest/license-exceptions.html diff --git a/content/designs/license-validation.md b/content/designs/license-validation.md new file mode 100644 index 0000000000000000000000000000000000000000..5f478a1f804efdd4a6fc7842994f6991575ab904 --- /dev/null +++ b/content/designs/license-validation.md @@ -0,0 +1,294 @@ +--- +title: License validation +short-description: Design proposal for source licenses validation +authors: + - name: Héctor Orón MartÃnez +--- + +# License validation + +## Scope + +The scope of this document is to describe a suitable system to deal with +license requirements and compliance validation. + +## Terminology and concepts + +### Agent + +Software component responsible for the extraction of licensing information from +source packages + +### Copyright + +Legal right created by the law of a country that grants the creator of an +original work exclusive rights for its use and distribution + +### License + +Legal instrument (usually by way of contract law, with or without printed +material) governing the use or redistribution of software + +### Ninka + +Standalone license scanner that can also be used as FOSSology agent + +### Nomos + +FOSSology agent license scanner + +### OBS + +Open Build Service + +### OSS + +Open Source Software + +## Tools under review + +### Generic license check tools +The tools listed below allow users to extract licensing information by scanning +source code. They can operate at different levels of granularity, from a single +source code file, to source tar packages, to ISO images containing source +packages. + +These tools are not tied to any specific distribution and are focused on Open +Source licenses. + +#### FOSSology +[FOSSology](https://www.fossology.org/) is a framework, a toolbox and web +application for examining software packages in a multi-user environment. + +From the web application or using web API with CLI, a user can upload individual +files or entire software packages to be scanned. FOSSology then will unpack the +uploaded data if necessary and run a chosen set of agents on every extracted +file. + +FOSSology framework currently focuses on licensing checks, but it could be used +in combination with agents aimed at doing different kinds of tasks such static +code analysis. + +In particular, its current toolkit can run licensing, copyright and export +control scans from the command line. + +The web application adds a web UI and a database to provide a compliance +workflow. In one click it can generate a SPDX file, or a ReadMe with the +copyrights notices from shipped software. + +FOSSology also deduplicates the entries to be analized, which means that it can +scan an entire distribution and when a new version is submitted only the files +that actually changed will get rescanned. + +FOSSology has many different interesting features: +* Regular expression scanning for licenses with Nomos +* Text-similarity matching with Monk +* Copyrights search +* Export Control Codes (ECC) +* Bucket processing +* License reviewing +* License text management +* Mark a license as main license of a software package +* Bulk recognition. Text phrase scan to identify files with similar license +contents that are recurring across multiple files +* Aggregated file view +* Reuse of license reviews +* Export information in different formats: + * Readme files for the distribution containing all identified license texts and +copyright information + * List of files in hierarchical structure with found licenses identified by +the short name identifier + * SPDX 2.0 export using the tag-value and the RDF-(XML)-format + * Debian-copyright (a.k.a. DEP5) files + +Backend tools and scanners are written in C/C++ and the frontend web application +is implemented with PHP. + +#### Ninka +[Ninka](http://ninka.turingmachine.org/) +[source](https://github.com/dmgerman/ninka) is a lightweight license +identification tool for source code. It is sentence-based, and provides a simple +way to identify open source licenses in a source code file. It is capable of +identifying several dozen different licenses (and their variations). + +Ninka has been designed with the following design goals: +* To be lightweight +* To be fast +* To avoid making errors + +FOSSology has recently added support for Ninka as agent. +It is mainly written in Perl. + +#### scancode-toolkit +[scancode-toolkit](https://github.com/nexB/scancode-toolkit/) scans code and +detects licenses, copyrights, packages manifests and dependencies. It is used to +discover and inventory Open Source and third-party packages used in projects and +can generate SPDX documents. + +Given a codebase in a directory, scancode will: + +* Collect an inventory of the code files and classify the code using file types +* Extract files from any archive using a general purpose extractor +* Extract texts from binary files if needed +* Use an extensible rules engine to detect open source license text and notices +* Use a specialized parser to capture copyright statements +* Identify packaged code and collect metadata from packages +* Report the results in your choice of JSON or HTML for integration with other +tools +* Display the results in a local HTML browser application to assist your analysis + +ScanCode is written in Python and also uses other open source packages. + +#### licensed +[licensed](https://github.com/github/licensed) has been recently released by +GitHub to check the licenses of the dependencies of a project. + +Modern language package managers (bower, bundler, cabal, go, npm, stack) are +used to pull the dependency chain of a specific project. + +Licenses can be configured to be either accepted or rejected, easing the +developer task of identifying problematic dependencies when importing a new +third-party library. + +### Debian centric license check tools +Tools below focus on Debian-derived environments, and work with +[DEP5](http://dep.debian.net/deps/dep5/) `debian/copyright` file format and/or +Debian packages. + +#### licensecheck +[licensecheck](https://metacpan.org/pod/App::Licensecheck) scans source code and +reports found copyright holders and known licenses. Its approach is to detect +licenses with a dataset (medium:~200 regexes) of regex patterns and key phrases +(parts) and to reassemble these in detected licenses based on rules. In that +sense this is somewhat similar to the combined approaches of FOSSology/nomos and +Ninka. It also detects copyright statements. It output results in plain text +(with customizable delimiter) or a Debian copyright file format. Written in +Perl. + +Auto generating a `debian/copyright` can be easily accomplished by: +--- +licensecheck --copyright -r `find * -type f` | \ + /usr/lib/cdbs/licensecheck2dep5 > debian/copyright.auto +--- + +#### debmake +[debmake](https://anonscm.debian.org/cgit/collab-maint/debmake.git) is a program +helper to generate Debian packages, which contains options for checking +copyright+license (-c) and compare `debian/copyright against current sources and +exit (-k). Written in Python. + +Auto generating a `debian/copyright` can be easily accomplished by: +--- +debmake -cc > debian/copyright + +--- +Compare new sources against upstream new sources: +--- +debmake -k +--- +It focus on license types and file matching, and is able to detect ineffective +blocks in the copyright file. + +It is buggy due to faulty unicode handling. + +#### license-reconcile +An alternative for comparison of `debian/copyright` versus current source tree +is also provided by +[license-reconcile](https://anonscm.debian.org/cgit/pkg-perl/packages/license-reconcile.git). +It reports missing copyright holders and years, but during testing it was +confused by inconsistent license names. + +`license-reconcile` attempts to match license and copyright information in a +directory with the information available in `debian/copyright`. It gets most of +its data from `licensecheck` so should produce something worth looking at out of +the box. However for a given package it can be configured to succeed in a known +good state, so that if on subsequent upstream updates it fails, it points out +what needs looking at. + +It can be particularly useful once a package has been configured to make it succeed, +so that any failure on subsequent upstream updates can be used to pay attention +to licensing changes that must be acknowledged. + +#### cme +[cme](https://metacpan.org/release/App-Cme) option is based on a config parsing +library. +--- +cme update dpkg-copyright +--- +This will create or update `debian/copyright`. The cme tool seem to handle UTF-8 +names better than debmake. Written in Perl, using licensecheck. + +#### elbe-parselicense +[elbe-parselicense](https://elbe-rfs.org/docs/sphinx/releases_v1.9.24/elbe-parselicence.html) +generates a file containing the licences of the packages included in a project. + +#### dlt +[dlt](https://github.com/agustinhenze/dlt/) has support for parsing and creating +Debian machine readable copyright files. Written in Python. + +## Recommended tools +Most of the tools discussed in the previous section are very useful in a way or +the other and some build on top of others. For the Apertis use case, it is +advisable to use some tool which already provides a framework to deal with +licenses and copyrights. The other tools can be hooked in different processes +for particular use cases, if those are needed, or those can be used to double or +triple check the output from other tools, if desireed. A good starting point is +FOSSology, which already provides a database and keeps track of licenses and +copyrights, it supports SPDX +and DEP5 output formats and its architecture is easily extendable via +plugins. Therefore this proposal recommends to use FOSSology as a start. After +initial setup is accomplished and workflow defined, it can be fine tuned +considering the other tools or extending FOSSology with such support. + +## Integration with current tools +In the current Apertis CI infrastructure, there are several stages: +* Phabricator (`code review`) - source code review system +* Jenkins (`buildpackage CI`) - CI build per source package code changes +* Open Build Service (`distro`) - contains all the distribution packages +* Jenkins (`images`) - builds images from distributed package repository pools +* LAVA (`testing`) - manages automated tests for different set of images +* Phabricator (`bugtracker`) - keeps track of image defects + +As initial step, it looks plausible to hook FOSSology after a new source package +is added or updated in Open Build Service. That way FOSSology database should +contain all needed data regarding licenses and copyrights and it can be queried +to extract information when needed. + +## Approach +The following proposal outlines the way FOSSology is meant to interact with other parts of system. + + + +Inputs +* FOSSology server will be fed with source code tarballs from repositories, starting by adding packages which conform the target runtime into FOSSology bucket. +* A list of software packages that conform target image runtime will be provided to FOSSology. + +Deliverable +* A SPDX and/or DEP5 license report of software packages found in the target runtime image. + +Every release should have a license report + +WIP: +Setup +Configuration +Clearing licenses +Rules setup +Day to day operation +Notifications +Generating a report + +TBD: FOSSology manual workflow for clearing licenses + +## References + +[Machine-readable debian/copyright file](http://dep.debian.net/deps/dep5/) + +[Creating, updating and checking debian/copyright semi-automatically](http://people.skolelinux.org/pere/blog/Creating__updating_and_checking_debian_copyright_semi_automatically.html) + +[debmake -- checking source against DEP-5 copyright](http://goofying-with-debian.blogspot.com/2014/07/debmake-checking-source-against-dep-5.html) + +[Improving creation of debian copyright file](https://ddumont.wordpress.com/2015/04/05/improving-creation-of-debian-copyright-file/) + +[scancode-toolkit wiki](https://github.com/nexB/scancode-toolkit/wiki) + +[Mozilla's Fossology investigation](https://wiki.mozilla.org/Fossology) diff --git a/content/designs/list.md b/content/designs/list.md new file mode 100644 index 0000000000000000000000000000000000000000..ba379e48aef0c4c9afb4943e0c253106331e76a4 --- /dev/null +++ b/content/designs/list.md @@ -0,0 +1,1212 @@ +--- +title: List design +short-description: Architecture and API design for the list widgets in Apertis + (unimplemented) +authors: + - name: Jonny Lamb + - name: Mathieu Duponchelle + - name: Philip Withnall +--- + +# List design + +The goal of this list design document is to establish an appropriate +architecture and API design for the list widgets in the Apertis +platform. + +Historically, the roller widget has provided a list widget on a cylinder +with no conceptual beginning or end and is manipulated naturally by the +user. For non-cylindrical lists there was a separate widget with a +different API and different usage. The goal is to consolidate list +operations into a base class and be able to use the same simple API to +use both cylindrical lists and non-cylindrical lists. + + + +The above shows an example of the roller widget in use inside the music +application. There are multiple roller widgets for showing album, +artist, and song. Although they are manipulated independently, their +contents are linked. + +## Terminology and concepts + +### Vehicle + +For the purposes of this document, a *vehicle* may be a car, car +trailer, motorbike, bus, truck tractor, truck trailer, agricultural +tractor, or agricultural trailer, amongst other things. + +### System + +The *system* is the infotainment computer in its entirety in place +inside the vehicle. + +### User + +The *user* is the person using the system, be it the driver of the +vehicle or a passenger in the vehicle. + +### Widget + +A *widget* is a reusable part of the user interface which can be changed +depending on location and function. + +### User interface + +The *user interface* is the group of all widgets in place in a certain +layout to represent a specific use-case. + +### Roller + +The *roller* is a list widget named after a cylinder which revolves +around its central horizontal axis. As a result of being a cylinder it +has no specific start and finish and appears endless. + +### Application author + +The *application author* is the developer tasked with writing an +application using the widgets described in this document. They cannot +modify the variant or the user interface library. + +### Variant + +A *variant* is a customised version of the system by a particular system +integrator. Usually variants are personalised with particular colour +schemes and logos and potentially different widget behaviour. + +## Use cases + +A variety of use cases for list design are given below. + +### Common API + +An application author wants to add a list widget to their application. +At that moment it is not known whether a simple list widget or a roller +widget will suit the application better. Said application author doesn't +want to have a high overhead in migrating code from one widget to +another. + +### MVC separation + +A group of application authors wants to be able to split the work +involved in developing their application into teams such that, adhering +to the interfaces provided, they can develop the different parts of the +application and easily put them together at the end. + +### Data backend agnosticity + +An application author wishes to display data stored in a database, and +does not want to duplicate this data in an intermediary data structure +in order to do so. + +### Kinetic scrolling + +The user wants to be able to scroll lists using their finger in such a +way that the visual response of the list is as expected. Additionally, +the system integrator wants the user to have visual feedback when the +start or end of a list is reached by the list bouncing up and back down +(the *elastic effect*). However, another system integrator wants to +disable this effect. + +The user expectations include the following: + + - The user expects the scroll to only occur after a natural threshold + of movement (as opposed to a tap), for the list to continue + scrolling after having removed their finger, and for the rate of + scroll to decrease with time. + + - The user expects to be able to stop a scroll by tapping and holding + the scrolling area. + + - The user expects a flick gesture to re-accelerate a scroll without + any visible stops in the animation. + + - The user expects video elements to continue playing during a scroll. + + - When there are not enough items to fill the entire height of the + list area the user expects a scroll to occur but using the elastic + effect fall back to the centre. + + - The user expects a horizontal scroll gesture to not also scroll in + the vertical direction. + +### Roller focus handling + +In the roller widget, the user wants the concept of focus to be +highlighted by the list scrolling such that the focused row is in the +vertical centre. + +Additionally, the user wants to be able to easily focus another +unfocused visible item in the list simply by pressing it. + +### Animations + +The user wants to have a smooth and natural experience in using either +list widget. If the scrolling stops half-way into an item and it is +required that one item is focused (see [](#roller-focus-handling)), they want +the list to bounce a small scroll to focus said item. + +### Item launching + +An application author wants to be able to perform some +application-specific behaviour in response to the selecting of an item +in the list. However, they want to provide some confirmation before +launching an item in a list. They want the two step process to be: + +1. The desired item is focused by scrolling to it or tapping on it. + +2. The focused item is tapped again which confirms the intention to + launch it. + +### Header and footer + +The application author wants to add a header to the column to make it +clear exactly what is in said column. (An example can be seen in +[][List design] in the music application.) + +Another system integrator wants the column names to be shown above the +widget in a footer instead of a header. + +### Roller rollover + +In the roller widget, by definition, the user wants to scroll from the +last item in the list back to the first without having to go all the way +back up. + +Additionally, the user wants the wrap around to be made more visually +obvious with increased resistance when scrolling past the fold between +end and start again. + +### Widget size + +The application author wants any list widget to expand into space +allocated to it by its layout manager. If there are not enough items in +the list to fill all available space said application author wants the +remaining space to be blank, but still used by the list widget. + +### Click activation + +A system integrator wants to choose between single click and double +click activation (see [](#item-launching)) for use in the list widgets. This +is only expected once the item has already been focused (see also +[](#roller-focus-handling)). + +The decision of single or double click is given to the system integrator +instead of the application author in order to retain a consistent user +experience. + +### Consistent focus + +The user focuses an item in the list and a new item is added. The user +expects the new item not to change the scroll position of the list and +more importantly not to change the currently focused row. + +### Focus animation + +An application author wants an item in the list to be able to perform an +animation after being focused. + +### Mutable list + +An application author wants to be able to change the contents of a list +shown in the user interface after having created the widget and shown it +in the user interface. + +### UI customisation + +A system integrator wants to change the look and feel of a list widget +without having to change any of the application code. + +### Blur effect + +A system integrator wants items in the list to be slightly blurred when +scrolling fast to exaggerate the scrolling effect. Another system +integrator wants to disable this blur. + +### Scrollbar + +A system integrator wants a scrollbar to be visible for controlling the +scrolling of the list. Another system integrator doesn't want the +scrollbar visible. + +### Hardware scroll + +A system integrator wants to use hardware buttons to facilitate moving +the focus up and down the list. The system integrator wants the list to +scroll in pages and for the focus to remain in order. For example, a +list contains items A, B, C, and D. When the *down* hardware button down +is pressed, the page moves down to show items E, F, G, and H, and the +focus moves to item E as it is the first on the page. + +### On-demand item resource loading + +The music application lists hundreds of albums but the application +author doesn't want the album art thumbnail to be loaded for every item +immediately as it would take too long and slow the system down. Instead, +said application author wants album art thumbnail to load only once +visible and have a placeholder picture in place until then. + +### Scroll bubbles + +A system integrator wants [bubbles] to appear when scrolling and +disappear when scrolling has stopped. + +### Item headers + +An application author wants to display items in a list but have a +logical separation into sections. For example, in a music application, +listing all tracks of an artist and separating by album. + +Another application author wants said headers to stick to the top of the +widget so they are always visible, even if the first item has been +scrolled past and is no longer visible. + +### List with tens of thousands of items + +An application author wants to display a list containing thousands +of items, but does not want to incur the initial cost of instantiating +all the items when creating the list. + + +### Flow layout + +An application author wants the list widget to render as a grid with +multiple items on the same line. The following video shows such a grid layout. + +<video width="640" height="480" controls> + <source src="media/album_cover_roller.mp4" type="video/mp4"> + <source src="media/album_cover_roller.ogv" type="video/ogg"> +</video> + +### Concurrent presentation of the same model in different list widgets + +An application author wants to present the same model in two side-by-side +list widgets, possibly with different filtering or sorting. + +## Non-use cases + +A variety of non-use cases for the list design are given below. + +### Tree views + +An application author wants to show the filesystem hierarchy in their +application. They understand that multi-dimension models (or trees) +where items can be children of other items are not supported by the +Apertis list widgets (hence the name list). + +### List widget without a backing model + +An application wants to display a list of items in the list widget, +but does not wish to create a model and pass it to the list widget, +and would rather use helper functions in the list widget, such as +`list_add_item(List *list, ListItem *item)`. + +Such an interface is not considered necessary, at least for this +version of the design document, because we want to encourage use of models +so that the UI views themselves can be rearranged more easily. + +If, in the future, such an interface was considered desirable, its +API should be similar to the [`GtkListBox`] API, such as +`gtk_list_box_row_changed()`. + +### Sticky header and footer + +An application developer wants an actor to stick to the top or +the bottom of the list widget, and always be visible regardless +of scrolling. + +This is best handled as a separate actor, sharing a common parent +with the list widget. + +## Requirements + +### Common API + +There should be a common API between both list widgets (see [](#common-api)). +Changing from a list widget to a roller widget or the other way around +should involve only trivial changes to the code leading to a change in +behaviour. + +### MVC separation + +The separation between components that use the list widgets should be +functional and enable application authors and system integrators to swap +out parts of applications easily and quickly (see [](#mvc-separation)). + +The implementation of the model should be of no impact to the +functionality of the widget. As a result the widget should only refer to +the model using an interface which all models can implement. + +### Data backend agnosticity + +The widget should not require application authors to store their backing +model in any particular way. + +### Kinetic scrolling + +Both list widgets should support kinetic scrolling from user inputs (see +[](#kinetic-scrolling)). That is, when the user scrolls using their finger, +he or she can *flick* the list up or down and the scroll will continue +after the finger is released and gradually slow down. This animation +should feel natural to the user as if he or she is moving a wheel up or +down, with minimal friction. The animation should also be easily stopped +by tapping once. + +#### Elastic effect + +In the list widget with a defined start and finish, on trying to scroll +there should be visual feedback that the start or finish of the list has +been reached. This visual feedback should be accomplished using the +*elastic effect*. That is, when the bottom is reached and further +downward scrolling is attempted, an empty space slowly appears with +resistance, and pops back when the user releases their finger. + +This is not necessary on the roller widget because the list loops and +there is no defined start and finish to the list. + +It should be easy to turn this feature off as it may be undesired by the +system integrator (see [](#kinetic-scrolling)). + +### Item focus + +In both list and roller widgets there should be a concept of focus which +only one item has at any one point. How to display which item has focus +depends on the widget. + +### Roller focus handling + +In the roller widget the focused item should always be in the vertical +centre of the widget (see [](#roller-focus-handling)). The focused item +should visually change and even expand if necessary to demonstrate its +focused state (see also [](#focus-animation)). + +Changing which item is focused should be possible by clicking on another +item in the list. + +### Animations + +It should be possible to add animations to widgets to allow for moving +the current scroll location of the list up or down (see [](#animations)). +This should be customisable by the system integrator and application +author depending on the application in question but should retain the +general look and feel across the entire system. + +### Item launching + +Focused items (see [][Item focus]) should be able to be launched using +widget-specific bindings (clicks or touches) (see [][Click activation]). + +### Header and footer + +It should be possible to add a header to a list to provide more context +as to what the information is showing (see [](#header-and-footer) and the +screenshot in [][List design]). This should be customisable by the +application author and should be consistent across the entire system. + +### Roller rollover + +The rollover of the two list widgets should be different and +customisable by the system integrator (see [](#roller-rollover) and +[](#ui-customisation)). + +The roller widget should roll over from the end of the list back to the +beginning again, like a cylinder would (see [][List design] and [][Roller]). +Additionally the system integrator should be able to customise whether +they want extra resistance in going back to the beginning. This is +visual feedback to ensure the user knows they are returning to the +beginning of the list. + +The non-roller list widget should not have a rollover and should have a +well-defined start and finish, with visual effects as appropriate (see +[][Elastic effect]). + +### Widget size + +The list widgets should expand to fill out all space that has been +provided to them (see [](#widget-size)). They should fill any space not +required with a blank colour, specified by the variant UI customisation +(see [](#ui-customisation)). + +### Consistent focus + +The focus of items in a list should remain consistent despite +modification of the list contents (see [](#consistent-focus)). Adding items +before or after the currently focused item shouldn't change its focused +state. + +### Focus animation + +The application author and system integrator should be able to specify +whether there is an animation in selecting an item in a list (see +[](#focus-animation) and [](#ui-customisation)). This could mean expanding an item to +make the item focused larger vertically and even display extra controls +which were previously hidden under the fold. + +During this animation, input should not be possible. + +### Mutable list + +The items shown in the list widgets and their content should update +dynamically when the model backing the widget is updated (see +[](#mutable-list)). This should require no extra effort on the part of the +application author. + +### UI customisation + +Both list widgets should be visibly customisable in the same way the +rest of the system is and should honour UI customisations made by the +system integrator (see [](#ui-customisation)). In this way, the list widgets +should use CSS (see the UI Customisation Design document) for styling. + +### Blur effect + +The list widget should support slightly blurring list items only when +scrolling (see [](#blur-effect)). It should be easily to disable this feature +by another system integrator who doesn't want the blur. + +### Scrollbar + +The list widget should support showing and hiding a scrollbar as +necessary (see [](#scrollbar)). It should be easy to disable this feature by +another system integrator who doesn't want to display a scrollbar. + +### Hardware scroll + +The list widget should support scrolling using hardware buttons and +therefore always have one item focused ([](#hardware-scroll)). Hardware +button callbacks should use the [adjustments][mx-scrollable] on the list widget to +change the subset of visible items and the appropriate list widget +function for moving the focus to the next item. Hardware button signals +are generated as described in the Hardkeys Design. + +### On-demand item resource loading + +Items in the list need to know when they are visible (and not past the +current scroll area) so they know when to load expensive resources, such +as thumbnails from disk (see [][On-demand item resource loading]). + +### Scroll bubbles + +The scrollbar (see also [](#scrollbar)) should support showing bubbles to +show the scroll position (see [](#scroll-bubbles)). It should be possible to +disable the bubble and change its appearance when necessary. + +### Item headers + +It should be possible to add separating headers to sets of items in the +list widgets (see [](#item-headers)). Said headers should also be sticky if +specified. + +### Lazy list model + +It should be possible to provide a ‘lazy list store’ to the widget, +in which items would be created on demand, when they need to be +displayed to the user. + +This model could make memory usage and instantiation performance independent +of the number of items in the model. + +See [][List with tens of thousands of items] + +### Flow layout + +It should be possible for *n* items, each of the same width and height, to be +packed in the same row of the list, where *n* is calculated as the floor of +the list width divided by the item width. There is no need for the programmer +to set *n* manually. + +### Reusable model + +The underlying model should not have to be duplicated in order to present it +in multiple list widgets at the same time. + +See [][Concurrent presentation of the same model in different list widgets] + +## Approach + +### Adapter interface + +As required by [](#data-backend-agnosticity1), the backing data model format +should not be imposed by the list widget upon the application developer. + +As such, an ‘adapter’ is required, similar to Android's [`ListAdapter`], this +adapter will make the bridge between the list widget and the data that backs +the list, by formatting data from the underlying model as list item widgets +for rendering. + +The following diagram illustrates how this adapter helps decoupling the list +widget from the underlying model. + + + +> In the above example, we assume a program that simply displays a list widget +> exposing items stored in a database, and an adapter that stores strong +> references to the created list items, and will eventually cache them all +> if the list widget is fully scrolled down by the user. This is as opposed to +> the approach presented in [][Lazy list model] where memory usage is also +> taken into account. +> +> The ‘cursor’ is a representation of whatever database access API is in use, +> as most databases use cursor-based APIs for reading. + +An interface for this adapter (the contents of the list widgets) is +required such that it can be swapped out easily where necessary +(see [](#mvc-separation), [][Lazy list model]). + +GLib recently (since version 2.44) added an interface for this very +task. [`GListModel`] is an implementation-agnostic interface for +representing lists in a single dimension. It does not support tree +models (see [][Tree views]) and contains everything required for the +requirements specified in this document. + +It should be noted that `GListModel`, which is for arbitrary containers, is +entirely unrelated to the `GList` data structure, which is for doubly linked +lists. + +In addition to functions for accessing the contents of the adapter, there is +also an `items-changed` signal for notifying the view (the list widget and list +item widgets it contains; see [](#mvc-separation)) +that it should re-render as a result of something changing in the adapter. + +### GtkListBox + +[`GtkListBox`] is a GTK+ widget added in 3.10 as a replacement for the +very complicated [`GtkTreeView`] widget. `GtkTreeView` is used for +displaying complex trees with customisable cell renderers, but more +often lists are used instead of trees. + +`GtkListBox` doesn't have a separate model backing it (but one can be +used), and each item is a `GtkListBoxRow` (which is in turn a `GtkWidget`). +This makes using the widget and modifying its contents especially easy +using the [`GtkContainer`] functions. Items can be activated (see +[](#item-launching) or selected (see [][Item focus]). + +`GtkListBox` has been used in many GNOME applications since its addition +and has shown that its API is sufficient for most simple use cases, with +a limited number of items. + +However `GtkListBox` is not scalable, as its interface requires that all its +rows be instantiated at initialisation, in order for example to add headers +to sections, and still be able to scroll accurately to any random position +in the list (random access). + +As such, its API is only of a limited interest to us, particularly when it +comes to [](#item-headers) or [](#filtering). + +### GtkFlowBox + +[GtkFlowBox] is a GTK+ widget added in 3.12 as a complement to `GtkListBox`. +Its API takes a similar approach to that of `GtkListBox`: it doesn't have a +separate model backing it – but one can be used – and each item is a +`GtkFlowBoxChild` which contains the content for that item. + +As with `GtkListBox`, its API is interesting to us for its approach to +reflowing children; see [][Column layout]. + +### Widget size + +The list widgets should expand to fill space assigned to them (see +[](#widget-size)). This means that when there are too few items to fill space +the remaining space should be filled appropriately, but when there are +more items than can be shown at one time the list should be put into a +scrolling container. + +In Clutter, actors are made to expand to fill the space they have been +assigned by setting the `x-expand` and `y-expand` properties on +`ClutterActor`. For example: + +``` c +/* this actor will expand into horizontal space, but not into vertical + * space, allocated to it. */ +clutter_actor_set_x_expand (first_actor, TRUE); +clutter_actor_set_y_expand (first_actor, FALSE); + +/* this actor will expand into vertical space, but not into horizontal + * space, allocated to it. */ +clutter_actor_set_x_expand (second_actor, FALSE); +clutter_actor_set_y_expand (second_actor, TRUE); + +/* this actor will stretch to fill all allocated space, the + * default behaviour. */ +clutter_actor_set_x_align (third_actor, CLUTTER_ACTOR_ALIGN_FILL); + +/* this actor will be centered inside the allocation. */ +clutter_actor_set_x_align (fourth_actor, CLUTTER_ACTOR_ALIGN_CENTER); +--- + +More details can be found in the [`ClutterActor`] documentation. + +The list item widgets (as described in [](#adaptermodel-implementation)) are packed +by the list widget and so application authors have no control over their +expanding or alignment. + +A suitable scrolling container to put a list widget into is the +[`MxKineticScrollView`] as it provides kinetic scrolling (see [](#kinetic-scrolling) +and [][Elastic effect]) using touch events. Additionally, the +`MxKineticScrollView` should be put into an `MxScrollView` to get a +scrollbar where appropriate (see [](#scrollbar)). + +For support of `MxKineticScrollView` the list widgets should implement the +`MxScrollable` interface, which allows getting and setting the +adjustments, and is necessary in showing a viewport into the interface + +The exact dimensions in pixels for the widget shouldn't be specified by +the application author as it means changes to the appearance desired by +a system integrator are much more difficult to achieve. + +### Adapter/Model implementation + +As highlighted before (in [][Adapter interface]), the list widget should +make no assumption about how the backing data is stored. An adapter data +structure should be provided, making the bridge between the backing data +and the list widget, by returning list item actors for any given position. + +The `GListModel` interface requires all its contained items to be +[`GObject`s with the same `GType`][GListModel-desc]. + +It is suggested that the items themselves are all instances of a new +`ListItem` class, which will inherit from `ClutterActor`, and implement +selection and activation logic. + +[`GListStore`] is an object in GLib which implements `GListModel`. It +provides functions for inserting and appending items to the model but no +more. For small lists, it is suggested to either use `GListStore` directly or +implement a thin subclass to give more type safety and better-adapted function +signatures. + +In these simple cases, `GListStore` will act as both the adapter *and* the +backing model, as it is storing `ListItem` widgets. For more complicated use +cases (where the same data source is being used by multiple list widgets), the +adapter and backing model must be separated, and hence `GListStore` is not +appropriate as the adapter in those cases. See [][Decoupled model]. + +### Decoupled model + +As shown in [][Adapter interface], the list widget will not directly +interact with the underlying data model, but through an ‘adapter’. + +The following diagram shows how the same underlying model may be queried +by two different list widgets, and their corresponding adapters. + + + +> List widgets are the outermost boxes in the diagram; the adapters are the +> next boxes inwards; and the middle box is the shared data model (a database). +> +> The ‘cursors’ in the above diagram are a representation of whatever database +> access API is in use, as most databases use cursor-based APIs for reading. + +### Lazy object creation + +`GListModel` allows an implementation to create items lazily (only create +or update items on screen and next to be displayed when a scroll is +initiated) for performance reasons. This is recommended for applications +with a very large number of items, so a new ListItem isn't required for +every single item in the list at initialisation. + +`GListStore` does not support lazy object creation so an alternative model +will need to be implemented by applications which need to deal with +huge models. + +An example for this is provided in [][Alternative list adapter]. + +### High-level helpers + +Higher-level API should be provided in order to facilitate common usage +scenarios, in the form of an adapter implementation. + +This adapter should be instantiatable from various common data models, through +constructors such as: `list_adapter_new_from_g_list_model` or +`list_adapter_new_from_g_list`. + +This default adapter should automatically generate an appropriate UI for +the individual objects contained in the data model, with the only requirement +that they all be instances of the same GObject subclass. This requirement +should be clearly documented, as it won't be possible to enforce it at +instantiation time for certain data models, such as `GList`, without iterating +on all its nodes, thus forbidding the generic adapter from adopting a lazy +loading strategy. + +The default behaviour of the adapter should be to provide a UI for all the +properties exposed by the objects (provided it knows how to handle them), +but much like the [Django admin site][django-admin-fields], it should be easy +for the user to modify which of these properties are displayed, and the +order in which they should be displayed, using a `set_fields()` method. The +suggested `set_fields()` API would take a non-empty ordered list of property +names for the properties of the objects in the model which the adapter should +display. For example, if the model contains objects which represent music +artists, and each of those objects has `name`, `genre` and `photo` properties, +the adapter would try to create row widgets which display all three +properties. If `set_fields (['name', 'genre'])` were called on the adapter, it +would instead try to only display the name and genre for the artist (name first, +genre second), and not their photo. The layout algorithm used for presenting +these properties generically in the UI is not prescribed here. + +This generic adapter should expose virtual methods to allow potential +subclasses to provide their own list item widgets for properties that may or +may not be handled by the default implementation, and to provide their own list +item widgets for each of the GObjects in the model. These are represented by +`create_view_for_property()` and `create_view_for_object()` on the +[][API diagram]. + +The adapter should use weak references on the created list items, as +examplified in [][Alternative list adapter]. + +Filtering and sorting should be implemented by this adapter, with the option +for the user to provide implementations of the sorting and filtering methods, +and to trigger the sorting and filtering of the adapter. It should be clearly +documented that these operations may be expensive, as the adapter will have +no other option than iterating over the whole model. + +If developers want to use the list widget with an underlying model that +allows more efficient sorting and filtering (for example a database), they +should implement their own adapter. + +Refer to the [][API diagram] for a more formal view of the proposed API, +and to the [][Generic adapter] section for a practical usage example. + +### UI customisation + +The list and list item widgets should be `ApertisWidget` subclasses (which +are in turn `ClutterActor`s) to take advantage of the `GtkApertisStylable` +mixin that `ApertisWidget` uses. This adds support for styling the widget +using CSS and [other style providers][GtkStyleProvider] which can be customised by +system integrators. + +As the list item widgets are customisable widgets, they can appear any +way the application author wants. This means that it is up to the +application author to decide on theming decisions. Apertis-provided list +item widgets will clearly document the CSS classes that affect their +appearance. + +### Sorting + +Sorting the model is built into the `GListStore`. When adding a new item +to the adapter `g_list_store_insert_sorted` is used with the third +argument pointing to a function to help sort the model. All items can be +sorted at once using the `g_list_store_sort` function, passing in the +same or different sorting function. + +When using an [][Alternative list adapter], sorting will need to be +implemented on a case-by-case basis. + +### Filtering + +As with `GtkListBox` when bound to a model, filtering should be implemented +by updating the contents of the adapter. + +The list widget will be connected to the adapter, and will update itself +appropriately when notified of changes. + +An example of this is shown in [the next section](#filtering1), the following +diagram illustrates the filtering process. + + + +> The ‘cursors’ in the above diagram are a representation of whatever database +> access API is in use, as most databases use cursor-based APIs for reading. + +### Header and footer + +The header should be a regular `ApertisWidget`, passed to the list widget as a +`header` property. It will be rendered above the body of the list widget. +Similarly, an `ApertisWidget` passed to a `footer` property will be rendered +below the body of the list widget. Using arbitrary widgets means the header’s +appearance is easily customisable. Passing them to the list widget means that +the list widget can set the widgets’ width to match that of the list. + +Applications can set either, both, or neither, of the `header` and `footer` +properties. If both are set, both a header and a footer are rendered, and the +application may use different widgets in each. + +### Selections + +As with `GtkListBox`, it should be possible to select either one or many +items in the list (see [][Item focus]). The application author can decide +what is the behaviour of the list in question using the “selection +modeâ€. The values for the selection mode are none (no items can be +selected), single (at most one item can be selected), and multiple (any +number of items can be selected). A multiple selection example can be found +in the [][Multiple selection] section. + +The `ListItem` object has a read-write property for determining it can be +selected or not. An example that sets this property can be found +in the [][Non-selectable items] section. + +The selection signals exposed by the list widget will be identical to those +exposed by `GtkListBox`, namely `item-selected` when an item is focused, +`item-activated` when the item is subsequently activated, and in the case +of multiple selection, the `selected-items-changed` signal will give the user +the full picture of the current items selected by the user, by being emitted +every time the current selection changes, and passing an updated list of +selected list items to potential callback functions. + +### Item headers + +Item headers are widgets used to separate items into logical groups (for +example, when showing tracks of an artist, they could be grouped in +albums with the album name as the item header). + +Due to the requirement for lazy initialisation, the solution proposed +by `GtkListBox` cannot be adopted here, as it implies that all the list +items need to be instantiated ahead of time. + +Our approach here is similar to [the solution][ListAdapter-separators] used +with Android's `ListAdapter`: as the adapter is decoupled from the data model, +and returns the actors that should be placed at a given position in the list, +it may also account for such headers, which should be returned as unselectable +`ListItem`s at specific positions in the adapter. + +We make no assumptions as to how implementations may choose to associate +a selected list item with the data model: in the simple case, the index +of a list item may be directly usable to access the backing object it +represents, if the backing data is stored in an array, and no item header +is inserted in the adapter. + +In other cases, where for example the backing data is stored in a database, +or item headers are inserted and offset the indices of the following items, +implementations may choose to store a reference to the backing object using +`g_object_set_qdata` or an external mechanism such as a `GHashTable`. + +An example of item headers is shown in [the following section][Header items]. + +### Sticky item headers + +As required for [][Lazy list model], when the list widget is scrolled to +a random point, items surrounding the viewport may not be created yet. + +It is proposed that a simple API be exposed to let application developers +specify the sticky item at any point in the scrolling process, named +`list_set_sticky_func`. + +The implementation will, upon scrolling the list widget, pass this function +the index of the top-most visible item, and expect it to return the `ListItem` +that should be stickied (or `NULL`). + +### Blur effect + +Given the `MxKineticScrollView` container does the actual scrolling, it is +the best place to implement the desired blur effect (see [](#blur-effect)). +Additionally, implementing it in the container means it can be +implemented once instead of in every widget that needs to have a blur +effect. + +### On-demand item resource loading + +By default, list items should assume they are not in view and should not +perform expensive operations until they have been signalled by the list +widget that they are in view (see [][On-demand item resource loading]). For +example, a music application might have a long list of albums and each +item has the album cover showing. Instead of loading every single album +cover when creating the list, each list item should use a default dummy +picture in its place. When the user scrolls the list, revealing +previously hidden items, the album cover should be loaded and the +default dummy picture replaced with the newly loaded picture. + +The list widget should have a way of determining the item from given +co-ordinates so that it can signal to said item when it comes into view +after a scroll event. The list item object should only perform expensive +operations when it has come into view, by reading and monitoring a +property on itself. Once visible an item will not return to the not +visible state. + +### Scroll bubbles + +The bubble displayed when scrolling to indicate in which category the +scroll is showing (see [](#scroll-bubbles)) should be added to `MxScrollBar` as +that is the widget that controls changing the scroll position. + +### Roller subclass + +The roller widget should be implemented as a subclass of the list widget. + +The roller widget will implement the `MxScrollable` interface, setting +appropriate increment values on its adjustments, in order to ensure the +currently-focused row will always be aligned with the middle after scrolling. + +As the roller subclass will implement [rollover](#roller-rollover), the +elastic effect when reaching the bottom of the list will not be used. + +In addition, it will also, in its `init` method, use a `ClutterEffect` in +order to render itself as a cylinder (or ideally, as a [hexagonal prism]). + +### Column layout + +By default, the list widget will display as many items as fit per row, given +the list’s width and the configured item width. Properties will be provided for: + * A `row-spacing` property, setting the blank space between adjacent rows. + * A `column-spacing` property, setting the blank space between adjacent + columns. + * An `item-width` property, setting the width of all columns. + * An `item-height` property, setting the height of all rows. + +Note that `GtkFlowBox` supports reflowing children of different sizes; in this +design, we only consider children of equal sizes, which simplifies the API. It +is equivalent to considering a `GtkFlowBox` with its `homogeneous` property set +to true. + +So, for example, to display items in a single column (one item per row), set the +`item-width` to equal the list’s width. To implement a grid layout where some +items may be missing and gaps should be left for them, implement a custom row +widget which displays its own children in a grid; and display one such row +widget per row of the list. + +We suggest the default value for `item-width` is to track the list’s width, so +that the list uses a single column unless otherwise specified. + +### Focus animation + +A `use-animations` property will be exposed on the list items. Upon activation, +an item with this property set to `True` will be animated to cover the whole +width and height allocated to the list widget. + +Application developers may connect to the `item-activated` signal in order to +update the contents of the item, as shown in [][Item activation]. + +Once the set of possible and valuable animations has become clearer, API +may be exposed to give more control to system integrators and application +developers. + +### Item launching + +In the roller subclass, the `item-activated` signal should only be emitted +for actors in the currently focused row. This will ensure that only items +that were scrolled to can be activated. + +### API diagram + + + +Signals are shown with a hash beforehand (for example, `#item-activated`), +with arguments listed afterwards. Properties are shown with a plus sign +beforehand and without getters or setters (`get_model()`, `set_model()`) +for brevity. + +## Example API usage + +### Basic usage + +The following example creates a model, creates item actors for each +artist available on the system, and adds them to the model. The exact +API is purely an example but the approach to the API is to note. + +As a simple example, this avoids creating a separate adapter and model, and +instead creates a model which contains the list row widgets. In more complex +examples, where data from the model is being used by multiple list widgets, +the model and adapter are separate, and the entries in the model are not +necessarily widget objects. See [Generic adapter] for such an example. + +{{ ../examples/sample-list-api-usage-basic.c }} + +The object created by `create_sample_artist_item` is an instance of +`ClutterActor` (or an instance of a subclass) which defines how the item +will display in the list. In that case, it is as simple as packing in a +`ClutterText` to display the name of the artist as a string. +More likely it would use a `ClutterBoxLayout` layout manager to pack +different `ClutterActor`s into a horizontal line showing properties of the +artist. + +### Item activation + +The following example creates a list widget using the function defined +in the previous section and connects to the `item-activated` signal to +change the list item actor. + +{{ ../examples/sample-list-api-usage-item-activated.c }} + +### Filtering + +The following example shows how to filter the list store bound to the list +widget created using the function implemented in [][Basic usage] +to only display items with a specific property. + +{{ ../examples/sample-list-api-usage-filtering.c }} + +### Multiple selection + +The following example sets the selection mode to allow multiple items to +be simultaneously selected. + +{{ ../examples/sample-list-api-usage-selection.c }} + +### Non-selectable items + +The following example makes half the items in a list impossible to select. + +{{ ../examples/sample-list-api-usage-non-selectable-items.c }} + +### Header items + +The following example adds alphabetical header items to the list. + +{{ ../examples/sample-list-api-usage-header-items.c }} + +### On-demand item resource loading + +The following example shows how on-demand item resource loading could be +implemented, using the model created in the first listing: + +{{ ../examples/sample-list-api-usage-resource-loading.c }} + +### Generic adapter + +The following example shows how developers may take advantage of the proposed +generic adapter. + +{{ ../examples/sample-list-adapter-api-usage.c }} + +### Alternative list adapter + +Authors of applications do not necessarily need to use a `GListStore` as their +adapter class. Instead they can implement the `GListModel` interface, and +pass it as the adapter for the list widget. + +In the following example, we define and implement an adapter to optimise +both initialisation performance by creating `MyArtistItem`s only when required, +and memory usage by letting the `List` widget assume ownership of the created +items. + +{{ ../examples/alternative_list_model.c }} + +## Requirements + +This design fulfils the following requirements: + + - [](#common-api1) — the list widget and roller widget have the same + API and any roller-specific roller API is in its own class. + + - [](#mvc-separation1) — `GListModel` is used as an adapter to the backing + data, which storage format is not imposed to the user. The list widget + and item widgets are separate objects. + + - [](#data-backend-agnosticity1) — Applications provide list items through + an adapter, no requirement is made as to the storage format. + + - [](#kinetic-scrolling1) — use `MxKineticScrollView`. + + - [][Elastic effect] — use `MxKineticScrollView`. + + - [][Item focus] — list items can be selected ([][Selections]). + + - [](#roller-focus-handling1) — this roller-specific selecting + behaviour can be added to the roller's class. + + - [](#animations1) — use `MxKineticScrollView`. + + - [](#item-launching1) — the item-activated signal on the list widget + will signal when an item is activated. + + - [](#header-and-footer1) — `ApertisWidget`s can be set as header or footer + ([](#header-and-footer2)). + + - [](#roller-rollover1) — this roller-specific rollover behaviour can + be added to the roller's class. + + - [](#widget-size1) — use `ClutterLayoutManager` and the `ClutterActor` + properties ([](#widget-size2)). + + - [](#consistent-focus1) — the API asserts a consistent focus and + ensures the implementation behaves when the model changes. + + - [](#focus-animation1) — items are `ClutterActor`s which can animate + using regular Clutter API. + + - [](#mutable-list1) — use `GListStore`. + + - [](#ui-customisation1) — subclass `ApertisWidget` and use the + `GtkStyleProvider`s. + + - [](#blur-effect1) — add a `motion-blur` property to + `MxKineticScrollView` and use that ([](#blur-effect2)). + + - [](#scrollbar1) — use the `scroll-visibility` property on the + `MxScrollView` container. + + - [](#hardware-scroll1) — use the adjustment on the list widget to + scroll down a page, and use the appropriate function to move the + selection on. + + - [][On-demand item resource loading1] — ensure list widget can look + up the item based on co-ordinates, and add a property to the list + item object to denote whether it’s in view, which the list widget + updates. + + - [](#scroll-bubbles1) — add support for overlay actors to + `MxScrollBar`. + + - [](#item-headers1) — Added as regular `ListItem`s in the adapter, a + `list_set_sticky_func` API is exposed. + + - [][Lazy list model] — see [][Alternative list adapter], for an example + of how application developers may implement their own model. + + - [][Flow layout] — The number of columns is calculated by dividing the + list width by the specified item width. Padding between the resulting + columns and rows may be specified using `row-spacing` and `column-spacing` + properties. + +## Summary of recommendations + +As discussed in the above sections, we recommend: + + - Write a list widget partially based on `GtkListBox`, which subclasses + `ApertisWidget`. + + - Write a list item widget partially based on `GtkListBoxRow` which also + subclasses ApertisWidget. + + - Add a `motion-blur` property to `MxKineticScrollView`. + + - Expose a sticky item callback registration method. + + - Add support for overlay actors to `MxScrollBar`. + + - Write a Roller widget as a list widget subclass. + + - Ensure new widgets are easily customisable using CSS. + + - Add demo programs for the new widgets. + + - Define unit tests to run manually using the example programs to + check the widgets work + correctly. + +## Appendix + +### Existing roller design + + + + + +[bubbles]: http://i.stack.imgur.com/YyRtC.png + +[mx-scrollable]: https://github.com/clutter-project/mx/blob/master/mx/mx-scrollable.h + +[`GListModel`]: https://developer.gnome.org/gio/stable/GListModel.html + +[`GtkListBox`]: https://developer.gnome.org/gtk3/stable/GtkListBox.html + +[`GtkFlowBox`]: https://developer.gnome.org/gtk3/stable/GtkFlowBox.html + +[`GtkTreeView`]: https://developer.gnome.org/gtk3/stable/GtkTreeView.html + +[`GtkContainer`]: https://developer.gnome.org/gtk3/stable/GtkContainer.html + +[`ClutterActor`]: https://developer.gnome.org/clutter/stable/ClutterActor.html + +[`MxKineticScrollView`]: https://github.com/clutter-project/mx/blob/master/mx/mx-kinetic-scroll-view.h + +[GListModel-desc]: https://developer.gnome.org/gio/stable/GListModel.html#GListModel.description + +[`GListStore`]: https://developer.gnome.org/gio/stable/GListStore.html + +[GtkStyleProvider]: https://developer.gnome.org/gtk3/stable/GtkStyleProvider.html + +[`ListAdapter`]: https://developer.android.com/reference/android/widget/ListAdapter.html + +[ListAdapter-separators]: http://stackoverflow.com/questions/18302494/how-to-add-section-separators-dividers-to-a-listview + +[hexagonal prism]: http://static.kidspot.com.au/cm_assets/32906/hexagonal-prism_346x210-jpg-20151022203100.jpg~q75,dx330y198u1r1gg,c--.jpg + +[django-admin-fields]: https://docs.djangoproject.com/en/1.10/ref/contrib/admin/#django.contrib.admin.ModelAdmin.fields diff --git a/content/designs/long-term-reproducibility.md b/content/designs/long-term-reproducibility.md new file mode 100644 index 0000000000000000000000000000000000000000..f180840cfd62d0b0566651a23d6bf26dd04686aa --- /dev/null +++ b/content/designs/long-term-reproducibility.md @@ -0,0 +1,413 @@ +--- +title: Long term reproducibility +short-description: Approaches for supporting Apertis-based products in the long term +authors: + - name: Emanuele Aina +--- + +# Background + +One of the main goals for Apertis is to provide teams the tools to support +their products for long the lifecycles needed in many industries, from civil +infrastructure to automotive. + +This document discusses some of the challenges related to long-term support and +how Apertis addresses them, with particular interest in reliably reproducing +builds over a long time span. + +Apertis addresses that need by providing stable release channels as a platform +for products with a clear tradeoff between updateness and stability. Apertis +encourages products to track these channels closely to deploy updates on a +regular basis to ensure important fixes reach devices in a timely manner. + +Stable release channels are supported for at least two years, and product teams +have three quarters of overlap to rebase to the next release before the old one +reaches end of life. Depending on the demand, Apertis may extend the support +period for specific release channels. + +However, for debugging purposes it is useful to be able to reproduce old builds +as closely as possible. This document describes the approach chosen by Apertis +to address this use case. + +For our purposes bit-by-bit reproducibility is not a goal, but the aim is to +be able to reproduce builds closely enough that one can reasonably expect that +no regressions are introduced. For instance some non essential variations +involve things like timestamps or items being listed differently in places +where order is not significant, cause builds to not be bit-by-bit identical +while the runtime behavior is not affected. + +# Apertis artifacts and release channels + +As described in the [](release-flow.md) document, at any given time Apertis +has multiple active release channels to both provide a stable foundation for +product teams and also give them full visibility on the latest developments. + +Each release channel has its own artifacts, the main ones being the +[deployable images](https://apertis.org/images/) targeting the [reference +hardware platforms](https://www.apertis.org/reference_hardware/), which get +built by mixing: + +* reproducible build environments +* build recipes +* packages +* external artifacts + +These inputs are also artifacts themselves in moderately complex ways: +* build enviroments are built by mixing dedicated recipes and packages +* packages are themselves built using dedicated reproducible build enviroments + +However, the core principle for maintaining multiple concurrent release +channels is that each channel should have its own set of inputs, so that +changes in a channel do not impact other channels. + +Even within channels sometimes it is desirable to reproduce a past build +as closely as possible, for instance to deliver an hotfix to an existing +product while minimizing the chance of introducing regressions due to +unrelated changes. The Apertis goal of reliable, reproducible builds does +not only help developers in their day-to-day activities, but also gives +them the tools to address this specific use-case. + +The first step is to ensure that all the inputs to the build pipeline are +version-controlled, from the pipeline definition itself to the package +repositories and to any external data. + +To track which input got used during the build process the pipeline stores +an identifier for each of them to uniquely identify them. For instance, +the pipeline saves all the git commit hashes, Docker image hashes, and +package versions in the output metadata. + + + +While the pipeline defaults to using the latest version available in a specific +channel for each input, it is possible to pin specific version to closely +reproduce a past build using the identifiers saved in its metadata. + + + +## Reproducible build environments + +A key challenge in the long term maintenance of a complex project is the +ability to reproduce its build environment in a consistent way. Failing to do +so means that undetected differences across build environments may introduce +hard to debug issues or that builds may fail entirely depending on where/when +they get triggered. + +In some cases, losing access to the build environment effectively means that +a project can't be maintained anymore, as no new build can be made. + +To be able to avoid these issues as much as possible, Apertis makes heavy use +of [isolated containers based on Docker +images](image-build-infrastructure.md#docker-images-for-the-build-environment). + +All the Apertis build pipelines run in containers with minimal access to +external resources to keep the impact of the environment as low as possible. + +For the most critical components, even the container images themselves are +created using Apertis resources, minimizing the reliance on any external +service and artifact. + +For instance, the `apertis-v2020-image-builder` container image provides +the reproducible environment to run the pipelines building the reference +image artifacts for the v2020 release, and the +`apertis-v2020-package-source-builder` container image is used to convert the +source code stored on GitLab in a format suitable for building on OBS. + +Each version of each image is identified by a hash, and possibly by some tags. +As an example the `latest` tag points to the image which gets used by default +for new builds. However, it is possible to retrieve arbitrary old images by +specifying the actual image hash, providing the ability to reliably reproduce +arbitrarily old build environments. + +By default the Docker registry where image are published keeps all the past +versions, so every build environment can be reproduced exactly. + +Unfortunately this comes with a significant cost from a storage point of view, +so each team needs to evaluate the tradeoff that better fits their goals +in the spectrum that goes from keeping all Docker images around for the whole +lifespan of the product to more aggressive pruning policies involving the +deletion of old images on the assumtion that changes in the build environment +have a limited effect on the build and using an image version which is close to +but not exactly the original one gives acceptable results. + +To further make build environments more reproducible, care can be taken to +make their own build process as reproducible as possible. +The same concerns affecting the main build recipes affect the recipes for the +Docker images, from storing pipelines in git, to relying only on snapshotted +package archives, to taking extra care on third-party downloads, and the +following sections address those concerns for both the build environments +and the main build process. + +## Build recipes + +The process to the reference images is described by textual, YAML-based +[Debos recipes](image-build-infrastructure.md#image-building-process) stored in +git repository, with a different branch for each release channel. + +The textual, YAML-based GitLab-CI pipeline definitions then control how the +recipes are invoked and combined. + +Relying on git for the definition of the build pipelines make preserving old +versions and tracking changes over time trivial. + +Rebuilding the `v2020` artifacts locally is then a matter of checking out the +recipes in the `apertis/v2020` branch and launching `debos` from a container +based on the `apertis-v2020-image-builder` container image. + +By forking the repository on GitLab the whole build pipeline can be reproduced +easily with any desired customization under the control of the developer. + +## Packages and repositories + +The large majority of the software components shipped in Apertis are +packaged using the Debian packaging format, with the source code stored in +GitLab that OBS uses to generate pre-built binaries to be published in a +APT-compatible repository. + +Separate git branches and OBS projects are used to track packages and versions +across different parallel releases, see the [](release-flow.md) document for +more details. + +For instance, for the v2020 stable release: + +* the `apertis/v2020` git branch tracks the source revisions to be landed in the + main OBS project +* the `apertis:v2020:{target,development,sdk}` projects build the stable + packages +* the `deb https://repositories.apertis.org/apertis/ v2020 target development sdk` + entry points `apt` to the published packages + +For most of the time the stable channel is frozen and updates +are exclusively delivered through the dedicated channels described below. + +Updates are split between small security fixes with low chance of regressions +and updates that also address important but non security-related issues which +usually benefit from more testing. + +For security updates: +* the git branch is `apertis/v2020-security` +* the OBS projects are `apertis:v2020:security:{target,development,sdk}` +* `deb https://repositories.apertis.org/apertis/ v2020-security target development sdk` + is the APT repository + +Similarly, for the general updates: +* the git branch is `apertis/v2020-updates` +* the OBS projects are `apertis:v2020:updates:{target,development,sdk}` +* `deb https://repositories.apertis.org/apertis/ v2020-updates target development sdk` + is the APT repository + +On a quarterly basis the stable channel get unfrozen and all the updates get +rolled in it, while the `security` and `updates` channel get emptied. + +This approach provides to downstreams and product teams a stable basis to build +their product without hard to control changes. Products are recommended to also +track the security channel for timely fixes, enabling product teams to easily +identify and review the changes shipped through it. + +The updates channel is not directly meant for production, but it offers to +product teams a preview of the pending changes to let them proactively detect +issues before they reach the stable channel and thus their products. + +While the stability of the release channels is suitable for most use-cases, +sometimes it is desirable to reproduce an old build as close to the +original as possible, ignoring any update regardless of their importance. + +To accomplish that goal the package archives are snapshotted regularly, +storing their full history. The image build pipeline accepts an optional +parameters to use a specific snapshot rather than the latest contents. +This results in the execution installing exactly the same packages and +versions as the original run, regardless of any changes that landed in the +archive in the meantime. + +To use a snapshot it is sufficient to change the APT mirror address, for +instance going from `https://repositories.apertis.org/apertis/` to +`https://repositories.apertis.org/apertis/20200305T132100Z` and similarly +for product-specific repositories. + +Every time an update is published from OBS a snapshot is created, tracking +the full history of each archive. +More advanced use-cases can be addressed using the optional +[Aptly HTTP API](https://www.aptly.info/doc/api/). + +## External artifacts + +While the packaging pipeline effectively forbids any reliance on external +artifacts, the other pipelines in some case include components not under the +previously mentioned systems to track per-release resources. + +For instance, the recipes for the HMI-enabled images include a set of +example media files retrieved from a `multimedia-demo.tar.gz` file hosted on +an Apertis webserver. + +Another example is given by the `apertis-image-builder` recipe checking out +Debos directly from the master branch on GitHub. + +In both cases, any change on the external resources impacts directly all the +release channels when building the affected artifacts. + +A minimal solution for `multimedia-demo.tar.gz` would be to put a version in its +URL, so that recipes can be updated to download new versions without affecting +older recipes. Even better, its contents could be put in a version tracking +tool, for instance using the Git-LFS support available on GitLab. + +In the Debos case it would be sufficient to encode in the recipe a specific +revision to be checked out. A more robust solution would be to use the packaged +version shipped in the Apertis repositories. + +## Main artifacts and metadata + +Ther purpose of the previuosly described software items is to generate a set of +artifacts, such as those described in [the v2019 release artifacts +document](release-v2019-artifacts.md). With the artifacts themselves a few metadata +entries are generated to help tracking what has been used during the build. + +In particular, the `pkglist` files capture the full list of packages installed +on each artifact along their version. The `filelist` files instead provide +basic information about the actual files in each artifact. + +With the information contained in the `pkglist` files it is possible to find +the exact binary package version installed and from there find the +corresponding commit for the sources stored on GitLab by looking at the +matching git tag. + +Other files capture other pieces of information that can be useful to reproduce +builds. For instance the `build-url` point to the full log of the build where +the recipe commit hash and the Docker image hash can be identified, however in +the [](#implementation-plan) section a few improvements are described to make +that information easier to retrieve and use. + +## Package builds + +Package builds happen on OBS which does not have snapshotting capabilities and +always builds every package on a clean, isolated environment built using the +latest package versions for each channel. + +Since the purposes taken in account in this document do not involve large scale +package rebuilds, it is recommended to use the SDK images and the devroots in +combination with the snapshotted APT archives to rebuild packages in a +environment closely matching a past build. + +# Recommendations for product teams + +Builds for production should: + +1. pick a specific stable channel (for instance, `v2020`) +1. version control the build pipelines using branches specific + to a stable channel +1. in the build pipeline, use the latest Docker image for that specific + channel, for instance `v2020-image-builder` or a product-specific + downstream image based on that +1. use the main OBS projects for the release channel, for instance + `apertis:v2020:target`, with the security fixes from + `apertis:v2020:security:target` layered on top +1. store the product-specific packages in OBS projects targeting a specific + release channel, layered on top of the projects mentioned in the + previous point +1. use the matching APT archives during the image build process +1. deploy fixes from the stable channels as often as possible + +Development builds are encouraged to also use the contents from the +non-security updates (for instance, `apertis:v2020:updates:target`) to get a +preview of non time-critical updates that will folded in the main archive +on a quarterly basis. + +The assumption is that products will use custom build pipelines tailored +to the specific hardware and software needs of the product. However, product +teams are strongly encouraged to reuse as much as possible from the reference +Apertis build pipelines using the GitLab CI and Debos include mechanisms, +and to follow the same best-practices about metadata tracking and build +reproducibility described in this document. + +# Implementation plan + +## Snapshot the package archive + +To ensure that build can be reproduced, it is fundamental to make the same +contents available from the package archive. + +The most common approach, also employed in Debian upstream, is to take +snapshots of the archive contents so that subsequent builds can point to the +snapshotted version and retrieve the exact package versions originally used. + +To provide the needed server-side support, the archive manager need to be +switched to the `aptly` archive manager as it provide explicit support for +snapshots. The build recipes then need to be updated to capture the current +snapshot version and to be able to optionally specify one when initiating +the build. + +Due to the way APT works, the increase in storage costs for the snapshot is +small, as the duplication is limited to the index files, while the package +contents are deduplicated. + +## Point to the recipe commit hash + +The current metadata do not capture the exact commit hash of the recipe used +for the build. This is a extremely important bit of information to reproduce +the build, and can be captured easily. + +## Capture the Docker image hash + +For full reproducibility it is recommended to use the exact image originally +used, but to be able to do so the image hash needs to be stored in the +metadata for the build. + +## Version control external artifacts + +External artifacts like the sample multimedia files need to be versioned just +like all the other components. Using Git-LFS and git tags would give fine +control to the build recipe over what gets downloaded. + +## Link to the tagged sources + +The package name and package version as captured in the `pkglist` files are +sufficient to identify the exact sources used to generate the packages +installed on each artifact, as they can be used to identify an exact commit. + +However, the process can be further automated by providing explicit hyperlinks +to the tagged revision on GitLab. + +# How to reproduce a release build and customize a package + +## Identify the recipe and build environment + +1. Open the folder containing the build artifacts, for instance + `https://images.apertis.org/release/v2021dev1/v2021dev1.0/` +1. Find the `recipe-revision.txt` metadata, + for instance `https://images.apertis.org/release/v2021dev1/v2021dev1.0/meta/recipe-revision.txt` +1. The `recipe-revision.txt` metadata file points to a specific commit in a + specific git repository, for instance + `https://gitlab.apertis.org/infrastructure/apertis-image-recipes/commit/cf6bfb79ea3163465c529868bf333f83d40d2b1a` +1. The `apt-snapshot.txt` metadata file indicates the snapshot of the APT + package archive used for the build +1. The `docker-image.txt` reports the Docker image name and hash used for the build, + for instance `registry.gitlab.apertis.org/infrastructure/apertis-docker-images/v2021dev1-image-builder:cf381b5e78f2` + +Once all the input metadata are known, the build can be reproduced. + +## Reproduce the build + +1. On GitLab, [fork](https://docs.gitlab.com/ee/user/project/repository/forking_workflow.html#creating-a-fork) + the previously identified recipe repository +1. In the forked repository on GitLab, [create a new branch](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#create-a-new-branch-from-a-projects-dashboard) + pointing to the commit identified in the steps above (for instance, `cf6bfb79ea3163465c529868bf333f83d40d2b1a`) +1. [Execute a CI pipeline](https://docs.gitlab.com/ee/ci/pipelines.html#manually-executing-pipelines) + on the newly created branch, specifying parameters for the exact Docker + image revision and the APT snapshot identifier + +When the pipeline completes, the produced artifacts should closely match the +original ones, albeit not being bit-by-bit identical. + +## Customizing the build + +On the newly created branch in the forked recipe repository, changes can be +committed just like on the main repository. + +For instance, to install a custom package: + +1. Check out the forked repository +1. Edit the relevant ospack recipe to install the custom package, either by + adding a custom APT archive in the `/etc/apt/sources.list.d` folder if + available, or retrieving and installing it with `wget` and `dpkg` (small + packages can even be committed as part of the repository to run quick + experiments during development) +1. Commit the results and push the branch +1. Execute the pipeline as described in the previous section diff --git a/content/designs/maintaining-workspace-across-sdk-updates.md b/content/designs/maintaining-workspace-across-sdk-updates.md new file mode 100644 index 0000000000000000000000000000000000000000..0cb9033d394722a1a79c8793f89f8463ec16c4d3 --- /dev/null +++ b/content/designs/maintaining-workspace-across-sdk-updates.md @@ -0,0 +1,194 @@ +--- +title: Maintaining workspace across SDK updates +authors: + - name: Andre Moreira Magalhaes +--- + +# Background + +The SDK is distributed as a VirtualBox image, and developers make changes to +adjust the SDK to their needs. These changes include installing tools, libraries, +changing system configuration files, and adding content to their workspace. +There is one VirtualBox image for each version of the SDK, and currently a +version upgrade requires each developer to manually migrate their SDK +customization to the new version. This migration is time consuming, +and involves frustrating and repetitive work. + +One additional problem is the need some product teams have to support different +versions of the SDK at the same time. The main challenge in this scenario is +the synchronization of the developer’s customizations between multiple +VirtualBox images. + +The goal of this document is to define a model to decouple developer +customization from SDK images and thus allowing the developer to have +persistence for workspace, configuration, and binary packages (libraries +and tools) over different SDK images. + +# Use cases + + * SDK developer wants to share the workspace among different SDK images + with minimal effort. In particular, the user doesn't want to have to + rely on manually copying the workspace across SDK images in order to keep + them in sync. + + * SDK developer wants a simple way to share custom system configuration + (i.e. changes to `/etc`) across SDK images. + + * SDK developer wants to keep tools and libraries selection in + sync over different SDK images. + +# Solution + +For addressing workspace persistence, and partially addressing tools and +libraries synchronization across different SDK images the following options +were considered: + * Use [VirtualBox shared folders] as mount points for `/home` and `/opt` + directories + * Ship a preconfigured second disk as part of the SDK images using + the [OVF format] (`.ova` files) + * Use a second (optional) disk with partitions for `/home` and `/opt` directories + and leave it to the developer to setup the disk. Helper scripts + would then be provided to help the developer setting up the disk + (e.g. setup partitions, mountpoints, copy existing content of `/home` and + `/opt` directories, etc) + +The use of shared folders would be ideal here given that the setup would be +simpler while also allowing the developer to easily share data between the host +and guest (SDK). +[The problem with shared folders][VirtualBox shared folders and symlinks] is +that they don't support the creation of symlinks, which is essential for +development given that they are frequently used when configuring a source tree +to build. + +However, the issue with symlinks is nonexistent when using a secondary disk, +as the disk can be partitioned and formatted using a filesystem that +supports them, making it a viable option here. + +While the option to ship a preconfigured second disk as part of the SDK images +(using the OVF format) seems like a better approach at first, it brings some +limitations: + * The disk/partitions size would be limited to what is preconfigured during + image build + * Although some workarounds exist for VirtualBox to use `.vdi` images + (native VirtualBox image format) on `.ova` files, this is not officially + supported and VirtualBox will even convert any `.vdi` file to `.vmdk` format + when exporting an appliance using the OVF format + * In order to allow the same disk to be used by multiple virtual machines at + the same time (concurrently), VirtualBox requires the disk to be made + [shareable][VirtualBox image write modes], which in turn requires fixed size + disks (not dynamically allocated). While this may not be a common usecase, + some developers may still want it to be supported, in which case the SDK + images would have a huge increase in size, thus impacting + download/bandwidth/etc. + +That said, we recommend the usage of a second disk configured by the developer +itself. This should add more flexibility to the developer, while avoiding the +limitations of using the OVF format. +Helper scripts could also be provided to ease the work of setting up the second +disk. +Another advantage of this solution is that current SDK users can also rely on +it the same way as new users would. + +However it is important to note that using this option would also impact QA, +as it would need to support the two different setups (with and without a second +disk) for proper testing. + +Also important to note that while this solution partially address tools and +libraries synchronization among different SDK images, it won't cover +synchronization of tools/libraries installed outside the developer workspace +or `/opt` directories. +Supporting it for any tools/libraries, despite of where they are installed, +would be quite complex and not pratically viable for several reasons such as +the fact that `dpkg` uses a single database for installed packages. + +For that reason we recommend developers that want to keep their package +installation in full sync among different images to do it manually. + +To address synchronization of system configuration changes (i.e. `/etc`) +the following options were considered: + * Use OverlayFS on top of /etc + * Use symlinks in the second disk (e.g. on `/opt/etc`) for each configuration + file changed + +Although the use of an [OverlayFS] seems simpler at first, it has some +drawbacks such as the fact that after an update, changes stored at the developer +customization layer are likely to cause dependency issues and hide changes to +the content and structure of the base system. +For example if a developer upgrades an existing SDK image (or downloads a new one) +and sets up the second disk/partition as overlay for this image's `/etc`, it may +happen that if the image had changes to the same configuration files present +in the overlay, these changes would simply get ignored and it would be hard for the +user to notice it. + +The other option would be to use symlinks in the second disk for each configuration +file changed. While this should require a bit more effort to setup, it should +at the same time give the user more control/flexibility over which configuration files +get used, and also should make it easier to notice changes in the default image +configuration, given that it is likely that the user would check the original/system +configuration files before replacing them with a symlink. + +For this option, the user would still have to manually create the symlinks in all SDK +images it wants to share the configuration, but that process could be eased with the +use of helper scripts to create and setup the symlinks. + +Note that this approach may also cause some issues, such as the fact that some specific +software may not work with symlinked configuration files or that early boot could +potentially break if there are symlinks to e.g. `/opt`. + +Given that the most common use cases for customizing system configuration would be +to setup things like a system proxy (e.g. `cntlm`) and that not many customizations +are expected, the recommended approach would be to use symlinks, as it would allow +the user to have more control over the changes. + +As mentioned above, no single solution would work for all use cases and the +developers/users should evaluate the best approach based on their requirements. + +# Implementation notes + +To setup a new second disk, the following would be required: + * Create a new empty disk image + * Add the disk to the SDK image in question using the VirtualBox UI + * Partition and format the disk accordingly + * Setup mountpoints (i.e. `/etc/fstab`) such that the disk is mounted during + boot + * Copy existing content of `/home` and `/opt` to the respective new disk + partitions - such that things like the default user files/folders are + properly populated on the new disk + +Optionally, if the developer plans to use the same disk across multiple SDK +instances at the same time, it must create a fixed size disk above +and mark it as `shareable` using the VirtualBox UI. + +To setup an existing disk on a new SDK image, the following would be required: + * Add the existing disk to the SDK image in question using the VirtualBox UI + * Setup mountpoints (i.e. `/etc/fstab`) such that the disk is mounted during + boot + +As mentioned above, helper scripts could be provided to ease this work. +A script could for example do all the work of partitioning/formatting the +disk, setting up the mountpoints and copying existing content over the new +partitions when on setup mode or only setup the mountpoints otherwise. +It could also allow the user to optionally specify the partitions size and +other configuration options. + +For system configuration changes, considering the recommended approach, +the same or another script could also be used to setup the symlinks based +on the content of `/opt/etc` when setting up the disk. +It is recommended that the content of `/opt/etc` mimics the dir structure +and filenames of the original files in `/etc`, such that a script could walk +through all dirs/files in `/opt/etc` to create the symlinks on `/etc`. + +The user would still have to manually install the packages living outside +`/opt` or the user workspace, but that can be easily done by retrieving the +list of installed packages in one image (e.g. using `dpkg --get-selections`) +and using that to install the packages in other images. + +[VirtualBox shared folders]: https://www.virtualbox.org/manual/ch05.html#sharedfolders + +[VirtualBox shared folders and symlinks]: https://www.virtualbox.org/ticket/10085 + +[VirtualBox image write modes]: https://www.virtualbox.org/manual/ch06.html#hdimagewrites + +[OVF format]: https://www.virtualbox.org/manual/ch02.html#ovf-about + +[OverlayFS]: https://en.wikipedia.org/wiki/OverlayFS diff --git a/content/designs/media-management.md b/content/designs/media-management.md new file mode 100644 index 0000000000000000000000000000000000000000..d32776b9e1510028dfa2635fd382925d08eae280 --- /dev/null +++ b/content/designs/media-management.md @@ -0,0 +1,1920 @@ +--- +title: Media management +short-description: Management (indexing, browsing) of media content + (partially-implemented, libraries and services need to be glued into an + integrated solution) +authors: + - name: Mateu Batle + - name: Sjoerd Simons +--- + +# Media management + +## Introduction + +This document covers the management of media content in the Apertis +platform. There are several types of media content to handle in the +platform: images, audio, video and documents. We can identify the +following operations with media: + + - **Media Indexing**: extracting metadata from media content and store + it in a format that allows fast retrieval. + + - **Media Browsing**: locate the media content and access its + metadata. + +[][Solution] provides a general overview of the technologies used, +like an executive summary of [][Appendix: Media management technologies], as well as a high level view of +the solution proposed. Additionally, it exposes in detail the media +management requirements in the Apertis platform, providing an analysis +as well as a solution to each requirement, which might involve modifying +existing technologies or even create new ones. + +Although this document is mostly focused on the media content, the +technologies introduced are related with other features in the platform +like global search, which allows to search not only in media content but +also in applications, messages, calendar events, etc. For details on +global search please check its specific design. + +[][Appendix: Media management technologies] is mostly used as +reference material from other sections of the document, so it is not +necessary to read from start-to-finish. It has a detailed description of +the current state of the technologies used for media management without +including specific requirements, additions and modifications described +on [][Solution]. + +This document assumes the adoption of a media-centric approach for +applications (every media source provider will have its own application +for browsing and playback). This provides a customized fully-featured +experience for each of the media provider services. See below the list +of media content providers that have been identified as requirements, +these services will be analyzed in more detail in chapter 2 Solution. + + - Local Storage. + + - Removable Storage Devices. + + - CD and DVD. + + - DLNA (UPnP). + + - Media Online Services: YouTube, Shoutcast, Dropbox, last.fm, + podcasts, etc. + + - Bluetooth AVRCP. + +## Solution + +The following sections will provide a high level view of the +technologies and solutions followed by a detailed analysis of the +requirements for media content sources supported. + +### Technology and Solution Overview + +This document looks at what changes could be made to the open source +components to better support the Apertis use cases, it is important to +note that those changes may not be possible for the scope of this +project and may not be accepted upstream. + +See below an enumeration and a brief overview of the main technologies +used in the design: + + - **Tracker** is a central repository for user information. It is made + of several components: Tracker Miner, Tracker Extract and Tracker + Store. Tracker Miner automatically crawls for media content files. + Tracker Extract gathers useful metadata from these files and it + stores this metadata in the Tracker Store database. Metadata can be + retrieved from the Tracker Store with SPARQL queries. See [][Tracker] + for more details. Although this document will only focus + on the Tracker features specific to media indexing, Tracker can be + used to store other information as well, like applications, + messages, calendar events, etc. or in general any information that + is worth to share between applications. + + - **Grilo** is a simple API for browsing media content and provide + media content metadata. Grilo layer helps to hide the complexities + of Tracker and its query language, by focusing on media content + (since Tracker is much more generic). See [][Grilo] for more + details. + + - **Tumbler**. It is a service for accessing and caching thumbnails. + See [][Thumbnail management] for more details. + + - **libsoup** and **librest** are libraries simplifying the creation + of HTTP client/servers and the access to REST-based services + respectively. See [][Librest and Libsoup]. + + - **libgdata** is a library implementing the Google Data Protocol. It + provides access to Google Services like YouTube and Picasa, among + others. + +The proposed solution combines Grilo, Tumbler and Tracker for locating +media content and retrieving its metadata from the local system and +removable storage. Tracker does the heavy work: filesystem crawling, +metadata extraction and metadata storage. Grilo is a simple API which +lies on top of Tracker, used by applications to discover media content +and its metadata. Tumbler is responsible of thumbnail generation. + +Tracker's scheduling algorithms needs to be modified to support the +requirements. The goal is to prioritize the different tasks of +information retrieval, so what applications need first must be retrieved +first. There are different cases depending on the specific requirements: + + - Prioritization done automatically by Tracker in a hard-coded way + (not configurable), like gathering all metadata from filesystem + (filename, size, modification time, etc.) before extracting metadata + from the file contents. + + - Prioritization done automatically but configurable, like + prioritizing the indexing of music files over video files. + + - Prioritization influenced or requested by upper layers. In some + cases, upper layers need to provide some clues about what needs to + be done first or what is more important, like a picture viewer + application boosting priority to metadata extraction of image files + (instead of the default which could be music files). + +The details on Grilo API stability can be checked in the API stability +design. In summary, it is still a young API and its API will be broken +on version 0.2. Under this situation, it might be convenient to layer an +Apertis SDK API on top of the Grilo API to improve API stability for the +application layer. + +See this illustration for an overview of the general architecture. Some of +the components listed will be introduced with more detail in the +following chapters. + + + +### **Local Storage Media Source** + +**Requirement R1**. Support local storage as a media source. + +**Analysis**. The system has storage memory to store media locally. +Locating media content in the system local storage and retrieving its +metadata is required. + +**Solution**. Collabora proposes a combination of Tracker and Grilo, as +a powerful solution for this endeavor (see section 2.1). Tracker can be +reviewed in detail in [][Technology and solution overview], and [][Grilo] in chapter 3.3. Upper +layers will just interact with the Grilo layer, which is a simple API +specialized in media browsing hiding the complexity of Tracker. + +Grilo allows to browse, search and locate the media content in the +system. The application can access the media content through the +filesystem API via the URI (Uniform Resource Identifier), e.g. +file://home/username/Music/song1.ogg. + +See requirement R5 for comments on public and private content. + +**Status**. Satisfied. + +### **Media Browsing Requirements** + +#### **File-system based browsing** + +**Requirement R2.** Support filesystem based browsing for early access. + +**Analysis.** This is required in order to quickly render a user +interface to the user, for example when plugging in a USB flash device. +Removable devices are potentially slow and it takes time to actually +index and capture all metadata, so information like author and album +could not be available on time. Therefore, a filesystem view should be +available through the media browsing framework itself at least, in order +to provide quick access to the media content by browsing the filesystem +structure; as opposed to other ways to browse content using the metadata +(by author, album, etc.). + +**Solution.** + +There is a Grilo Filesystem plugin. This is the fastest way to access +the filesystem entries in the device. Content would be available soon +after the filesystem is mounted on the system. Additionally, this plugin +already monitors and reports for changes on the directories or files. +One disadvantage of the Grilo Filesystem plugin is that it could be hard +to access the metadata or get notified about changes in an efficient +way. + +Another solution would be to use Grilo Tracker plugin. Grilo plugins +provide access to the media content in a hierarchical way. Grilo Tracker +plugin has two modes of hierarchical navigation, one based on categories +and another one based on the filesystem. The latter one provides the +info in the same structure as it is stored in the filesystem. It allows +to browse from a root folder or from specific folders. However, the +information has to be previously available in Tracker Store for this to +work. To minimize this delay, Tracker scheduler will be changed to get +filesystem information before other media metadata. Obtaining the +filesystem information is very fast compared to the extraction of the +metadata (which involves reading the file contents). Some timings have +been gathered to show this fact, check the table in [][Appendix: Questions [Appendix: Questions (#appendix-questions--answers) Answers] Answers] for the +details. This solution plays nicely with requirement R3 (to get +notifications of ready metadata as soon as it is available) and with R8 +and R13 (regarding the scheduling of operations like crawling, metadata +extraction, etc.). + +The last solution provided looks more promising than the first one, +since it integrates better with the overall architecture and it does not +have a negative impact in other requirements. + +**Required work**. + +Grilo Tracker plugin will need to be modified to operate as specified in +the solution, and it actually depends on requirement R8 and R13 related +to Tracker scheduling. Additionally, an API would need to be provided to +change easily from one hierarchical model to the other on run-time. See +[][Grilo Media Source Plugins] for more information about +Grilo. + +**Status**. Satisfied. + +#### **Notification on metadata changes** + +**Requirement R3.** Metadata info can change during run-time, so the +media browsing API has to notify whoever is interested through some +mechanism when these changes happen. + +**Analysis**. The indexing process is asynchronous, it can happen that +media content gets its metadata updated while the content is already +being shown to the user. + +Tracker internally uses the file system monitor service provided by the +Linux kernel, which is a very efficient way to get notified about +changes on the filesystem and it is not doing active polling. + +Once Tracker Miner gets notified about a change in the filesystem, it +will check what needs to be done depending on the specific type of +change. For example, if a new file is added it will determine if the new +file is interesting for Tracker or not, much in the same way it does +when crawling through the filesystem looking for files to index. In the +case of a notification of a deleted file, it would remove its associated +information in Tracker Store. In the case of modified files, it would +extract the information again. + +**Solution**: Grilo tracks changes in Tracker Store by subscribing to +the **GraphUpdated** D-Bus signal from the Tracker Store service (see +[][Tracker storage] for more details). Grilo processes this +information and provides notifications of changes on media content. See +the following illustration for an overview of the interaction between the components +involved. + +**Status**. Satisfied. + + + +#### **Paged queries** + +**Requirement R4**. Provide queries to request content information by +pages of fixed size. + +**Analysis**. There are potentially lots of results in a query for +browsing media content. Therefore, a mechanism to get the results +incrementally as needed is required. + +**Solution**: Grilo supports paging in all requests via skip and count +numbers. Internally Grilo uses both mechanisms provided by Tracker +SPARQL (OFFSET / LIMIT modifiers in the SELECT SPARQL statements and +TrackerSparqlCursor). See [][Grilo] for details on Grilo. + +**Status**. Satisfied. + +### **Media Indexing Database Requirements** + +#### **Media indexing of shared and private files** + +**Requirement R5**: The system must be capable of indexing shared and +private files. Shared files can be accessed by all users in the system. +Private files are only accessible for the user who created them +initially. + +**Analysis**. The reason of this requirement is to guarantee a minimum +level of data confidentiality among the users in the system (for example +regarding personal photos and documents). This would be even more +important if we consider Tracker could be used to store other +information as well. + +We assume there are folders which are public (shared and accessible to +all users in the system) and folders which are private (only accessible +to the owner of the folder). Due to the existence of private content, +each user must have its own Tracker database for storing metadata. + +In the future, the device may have different configurations for privacy. +First case would be that all user files are public, and they should be +available for indexing by all other users. Second case, where each +user's files are private. A third case would be that the user would be +prompted which files to make public. Those public files should be +available for indexing by all. + +**Solution**. Due to Tracker’s architecture, it is not neither easy nor +efficient to add the capability to have more than one database managed +by a Tracker instance. Due to the nature of SPARQL queries, it would +require very complex database joins and performance would suffer. SQLite +is known to be very slow in such setup. Additionally, Tracker developers +are not keen on accepting this change, since Tracker had a similar +behavior in the past, and it was abandoned due to multiple problems. +Therefore, this would probably produce a fork of the Tracker version in +the middleware and it would be a huge increase on maintenance cost. In +summary, Tracker managing multiple databases does not seem feasible for +now. + +The proposed solution is to have a just a Tracker instance for each +user, which holds both the metadata for private files belonging to the +user and the metadata for public files. + +A drawback of this solution is the additional space needed, since the +metadata for the public files is stored in each Tracker instance. Due to +the local system storage in the automotive industry being very +expensive, we could think there will not be really many public files to +index. Additionally, the database space used to index those public files +is really minimal (0.03% as shown in Table 1) and the number of +potential users in a system is very reduced. In the case of removable +storage files, that will be treated as public files. The solution for +indexing and thumbnailing will be covered in [][Indexing database on removable device]. + +Another drawback is the extra processing required to index the public +contents for each user. There are also some risks about overloading too +much the system in this case, but those could be managed in the Tracker +scheduler. + +In the case of the thumbnails, it is possible to share the thumbnails +objects, since they are stored in files. Also note a Tracker instance +would need to run for every user logged in into the system; only Tracker +Store and Tracker Miner though, not Tracker Extract which automatically +shuts down when idle. + +To handle future privacy configurations, file permissions should be set +accordingly, and Tracker configured to index files of all users. +Thumbnails should be generated and stored in a central location where +they could be retrieved by all Grilo instances. Also, AppArmor profiles +should be probably tweaked to allow Tracker instances to read other +users' files. + +**Status**. Satisfied. + +#### **Database version management** + +**Requirement R6**. The system should be able to cope with database +version updates. + +**Analysis**. Database version updates is very tricky regarding Tracker, +since the updates could happen in different levels: + + - **SQLite database level**. Every effort is made to keep SQLite fully + backwards compatible from one release to the next. Rarely, however, + some enhancements or bug fixes may require a change to the + underlying file format. There are two types of updates, and you can + differentiate by comparing the version numbers of the old and new + libraries. + + - First digit update on the version number. A reload of the database + will be required. Therefore, the contents of the database has to + be dumped into a portable ASCII representation using the old + version of the library and then reload the data using the new + version of the library. So we would need either a backup done with + the old version or have the old version distributed to do a dump + of the database. Last first digit change was on June 2004. + + - Second digit update on the version number. It is backwards + compatible, so newer versions will be able to read and write older + database files. But there is no guarantee of forward + compatibility. Last second digit change was on July 2010. Provided + we want to upgrade to the new version, the update of the database + could be done with just the new version. + + - **Tracker RDF mapping level and Ontology level**. First is related + with the mapping from RDF database model to a relational database + model (SQLite in this case). Second is related with changes on the + models defining the domains, objects, its properties and links. Both + of these changes are tracked by the Tracker database version. If the + version is different, then Tracker must perform a full re-index, as + there is no backwards compatibility. However, by using the Tracker + journal, it would just be like a reload of information, since the + journal is like a log of all transactions done in the database. This + does not guarantee all the information will be retained, since due + to changes in the ontology, some data might be invalid on the new + model. There is also another way to cope with ontology changes, via + ALTER TABLE directly in SQLite, but this requires some custom coding + to be done and it is very complex to handle all the cases in + ontology changes. The last time the Tracker database version was + changed was in version 0.9.38 (February 2011). See [][Tracker storage]. + +It is clear that changes in the Tracker database version is a larger +risk than changes in SQLite. Let us analyze various scenarios: + + - If Tracker Store just holds indexing information, this could be + regenerated by re-indexing, so there would be no real data loss on + an database version update. + + - If Tracker Store keeps information entered by the user, like user + tags, then it would be lost during a full re-index. To prevent this, + an ad-hoc tool could be implemented to convert this information to + the new database version. + + - Often the manufacturers or distribution maintainers decide to not + deploy new changes on the ontologies to avoid these database update + problems. Anyhow, some changes could be supported via some custom + code, like adding / removing properties; but others affecting the + domains or class hierarchy are much harder to handle. Each case of + ontology change needs to be analyzed particularly. + +**Solution**. It is a bit of a case by case trade-off between storage +space for the Tracker journal vs CPU time for re-indexing. Assuming we +cannot use unlimited storage space on the device, then using the Tracker +journal is not an option. The way to handle database version updates is +to analyze them on a case by case basis. There are several points to +evaluate like what is the impact of the update in the existing database, +what type of data it is (generated data vs user data), and what +solutions are possible to keep the data (either implementing ad-hoc +tools to migrate data or make use of already available tools). + +See more details on Tracker Journal in [][Tracker storage]. + +**Status**. Satisfied. + +#### **Indexing database on removable device** + +**Requirement R7.** Storage of the indexing information for removable +storage in the removable storage itself. + +**Analysis**. The main motivation for this requirement is to avoid using +the scarce expensive storage in the system. Here are some general +problems and risks with this approach: + + - **Data corruption**. The user can disconnect the removable device at + any time without properly syncing. For a holistic view on robustness + see Robustness document. See points below to consider: + + - Risk of corruption for user files and filesystem metadata. The + device could have been ejected in the middle of a write + operation. The device would not be usable unless its filesystem + is recovered, and the user could lose some or all the files. + + - Journalled filesystems work more reliably, guaranteeing at least + the filesystem will not be left in an inconsistent state. In any + case, the user is the one who chooses the filesystem for its own + USB flash devices, and not the system, so there is not much to + do here since the FAT filesystem is typically the de facto + standard used in USB flash devices, which is not a journalled + filesystem. Another point is that USB flash devices are + typically optimized for FAT filesystems. + + - Write cache disabling for the USB flash device decreases the + data corruption risk, but the risk does not disappear. The user + could still eject on the middle of a write operation. As a + result of the disabled cache write operations will be slower. + Additionally, USB flash manufacturers tend to lie regarding sync + requests. + + - Note: the size of thumbnails has not been considered in this + section, since the thumbnail storage is independent from the + metadata storage. However, as we can see in the modeling + spreadsheet, the size of the thumbnails is really significant, + even more than the metadata size, so most probably it would make + sense to store thumbnails and album art in the USB flash device. + Therefore the risk of data corruption cannot be avoided in the + end, just minimized. + +**Solution.** The alternative to use a dedicated metadata database in +removable storage devices was discarded due to data corruption and +maintainability problems. However, thumbnails and album art will be +stored in the removable storage. That is a large portion of the +metadata, and will help save local storage space. + +A single Tracker instance per user in local storage holding the metadata +for media content in the USB flash devices. + +The thumbnails and album art will be stored in the USB flash device. As +we saw before, any write to a USB flash device could end up into +corruption if the user does not behave correctly. A check should be +added when generating thumbnails to use local storage when the removable +device is full. + +*Note: In the current implementation, If the device does not have enough +free space, thumbnails will be generated. Album art will be generated in +the local storage cache.* + +The disk space usage can be controlled by removing metadata of unmounted +external devices when the disk space is low and/or when the DB size +exceeds a given limit. + +Currently Tracker removes metadata only after 3 days, and when the disk +space is low, the indexing engine simply stops. A trigger shall be added +to remove metadata if the disk space is low, starting with data from +removable storage devices. + +Also, the default for the database size limit is unlimited. A limit will +be set, to prevent waste of local disk space, and the database will +purge old data when the limit is hit. + +**Status**. Satisfied. + +### Indexing Scheduling + +There are many specific requirements related with metadata extraction +prioritization. They will be analyzed in detail in the following +subsections. + +The Tracker Scheduler will need modifications to be able to specify +priorities as well as separate the operations on different stages. +Additionally some extra hooks might be needed in order to provide hints +from the browsing applications. There are several ways to implement this +prioritization. One way would be by an API that allows the application +to explicitly give priority to certain operations or use cases. Another +way would be a heuristic way based on recent queries done to the media +framework. This automatic approach although initially interesting looks +a bit risky, as there could be unpredictable interactions between +applications. See [][Tracker scheduling] for more details on +how Tracker Scheduling works in the upstream version. + +The following illustration shows an overview of how the scheduling +and priorities of indexing operations works. There is a main component, +the Tracker Filesystem Miner, feeding the task queues. Generating new +tasks is based on previous queries, filesystem events (e.g. new file +created) and as a result of crawling the filesystem. Tasks are consumed +from the queues by different components in order, the lower the priority +the first it gets executed. The priority of a task is determined by the +type of task, which defines the queue where the task belongs. +Additionally, tasks resulting from recent queries are normally placed in +the front of the queue since they will most likely be a result of user +interaction. Also note this design allows to do some configuration +regarding the type of tasks and their priority, as well as test for +other ideas during the development. Requirement R12 has more details +about the abstraction of different types of tasks in the queues. + + + +#### Media Content Counters + +**Requirement R8**. Provide the number of items per content type as soon +as possible. + +**Analysis.** To determine the number of items per content type, all +files must be crawled first, and its mime type must be determined. It is +not needed to do a full extract of metadata to determine the mime type, +but in some cases it might be needed to read the first few bytes of a +file (see Q\&A for more details about determining the mime type). + +Tracker crawls the filesystem for new files to be indexed, and adds +these files to a internal queue. Each time a file from the queue is +processed, there are two steps. The first step, which is done by the +Tracker filesystem Miner, gathers metadata from the filesystem +attributes without actually inspecting the file contents. In a second +step, more information is extracted by Tracker Extract by inspecting the +file contents, which is a more expensive operation. These steps are done +for every file processed. However to meet the requirements above, we +would perform the first pass for all the items found before starting the +second pass for every item. + +**Solution.** Collabora will add an option in Tracker's configuration to +enable two pass indexing. If enabled, tracker will first crawl the whole +filesystem to store files' attributes but won't try to get embedded +information (e.g. MP3 metadata, etc). A boolean property will be added +in Tracker's database for files that need a 2nd pass, so Tracker knows +which files needs a 2nd pass when it is done crawling the filesystem. +That property needs to be written into the database (and not only +in-memory) so Tracker is able to correctly resume its indexing after a +system reboot. Additionally, directories containing partially indexed +files will be flagged (in memory), to avoid re-crawling the whole +filesystem when doing the 2nd pass (a list of all partially indexed +files would be too big and consume too much memory). + +This solution has been discussed with upstream developers and has great +chances to be accepted. + +**Status**. Satisfied. + +#### Prioritized extraction per content type + +**Requirement R9**. prioritize metadata extraction per content type: +first music play-list, music, video, pictures and documents. Default +prioritization can be adjusted on run-time depending on user activity, +e.g. if user starts browsing pictures. + +**Analysis**. Current Tracker scheduling does the metadata extraction in +no specific order. + +**Solution**. A D-Bus interface will be added on Tracker. That interface +will be used by applications to tell Tracker about their current +priorities. For example, a music application will ask Tracker to index +“audio/\*†mime-type first. + +If an application requests priority for a certain mime-type, Tracker +will skip any other file while crawling the filesystem. Additionally, +directories containing skipped files will be flagged (in memory), to +avoid re-crawling the whole filesystem when Tracker is done indexing all +files that have the priority (a list of all skipped files would be too +big and consume too much memory). + +When Tracker is done crawling the whole filesystem, it will do the 2nd +pass indexing (see 2.5.1) on the files that have the priority (e.g. if +the music application is running, the 2nd pass is done only on audio +files at this point). When done, it will do the 2nd pass on all files, +ignoring the filters. + +If an external storage device is plugged while Tracker is doing the 2nd +pass, it stops and crawls the new media first (doing first pass on +prioritized files). When done, Tracker will resume doing the 2nd pass. + +If priorities changes while Tracker is doing the 2nd pass, it stops and +crawl directories where files have been skipped earlier. When done, +Tracker will resume doing the 2nd pass. + +In summary, Tracker will do the 1st pass indexing (file attributes only, +no embedded metadata) on prioritized files, then 2nd pass on prioritized +files, then 1st pass on not prioritized files, and finally the 2nd pass +on not prioritized files. + +This solution has been discussed with upstream developers and has great +chances to be accepted. + +**Status**. Satisfied. + +#### Selective prioritized extraction + +**Requirement R10**. Prioritize metadata extraction for certain files, +e.g. music files currently shown to the user. + +**Analysis**. The goal is to influence the scheduling of extract +operations in Tracker based on the user behavior. for example, If a user +is browsing a specific folder in the filesystem, the metadata extraction +of the files currently displayed to the user, must have priority over +others. Additionally, the system could anticipate the needs of the user, +by trying to extract metadata for next media content items in the page. +This can be done by influencing the priority of extract operations in +Tracker by checking the results of recent queries. + +**Solution**. The D-Bus interface proposed in 2.5.2's solution will be +extended to let applications give the priority on some specific files, +in addition to the general mime-type priority. + +The following would be implemented as part of the solution: + + - **Extract normal**. The current behavior, that is without automatic + prioritization of extraction based on queries. + + - **Extract recent**. This will automatically request the metadata + extraction for media content items returned in recent queries. + + - **Extract next**. This will automatically request the metadata + extraction for media content items that would result in next page of + recent queries. This setting will imply "Extract recent" as a + dependency. + + - **Extract thumbnail**. This will automatically request the thumbnail + computation for media content items returned in recent queries (or + next page items if "Extract next" is also set). + +The application or SDK layer would be the responsible for enabling the +settings more appropriate for every specific case. Alternatively, Grilo +could have extract recent, new and thumbnails enabled by default. This +is a trivial change that could be decided later on during the +development phase. + +Solution needs to be discussed in more detail with upstream Tracker +maintainers. + +**Status**. Satisfied. + +#### Selective prioritized thumbnailing + +**Requirement R11**. Prioritize thumbnails depending on user activity. + +**Solution**. This is already covered by requirement R10. + +**Status**. Satisfied. + +#### Multi pass metadata extraction + +**Requirement R12**. Iterative process for metadata extraction in +multiple passes: blank entry just file names, textual information, +graphical information like thumbnails, information from internet, etc. + +**Solution**. The proposed solution in 2.5.1 already describe 2 pass +indexing. A third pass can be added the same way to create thumbnails, +get information from internet, etc. + +The solution needs to be discussed in more detail with upstream Tracker +maintainers. + +Collabora proposes Tumbler to generate and manage the thumbnailings (but +not scheduling the thumbnailing). In current version, Tumbler provides a +D-Bus service with schedulers to manage the thumbnails. Tumbler does not +do any crawling to look for contents to be thumbnailed; Tracker will +request thumbnailing operations to Tumbler. Although Tumbler has several +schedulers to keep track of the thumbnailing requests with different +priorities, it will be Tracker who takes care of the scheduling. + +Thumbnail calculation is particularly expensive in CPU and storage +resources. See the table in [][Thumbnail management] for +more detailed information. + +**Status**. Satisfied. + +#### **Concurrency configurable** + +**Requirement R13.** The scope (e.g. quantity of extracted data) within +one step, grabbing the data concurrent for multiple files. + +**Solution**. Tracker has a scheduler priority parameter which allows to +issue new operations when the CPU is idle. Additionally there is an +internal setting to set the task pool limit, which controls the number +of concurrent tasks that can run at the same time. Currently this value +is hard-coded to one, but it could be exposed via configuration or make +it dependent on the number of cores in the system depending on Apertis' +needs. Additionally there is support to adjust the amount of work to do +concurrently, in order to avoid overloading the system. This is set by +the throttle parameter, which basically allows to specify how many +extract operations can be carried per second (see [][Tracker miner] +for more details on throttle and scheduler priority). + +The operations handled by the scheduler have small granularity (a single +file), so it is expected the whole system can react in time to get in / +out from the idle state. The management of the idle status is done +directly by the kernel, by setting the appropriate input / output +priorities and CPU priorities to idle. Additionally, a specific cgroup +could be set up to have more control over the resources used for media +indexing. + +Solution needs to be discussed in more detail with upstream Tracker +maintainers. + +**Status**. Satisfied. + +### Thumbnailing + +#### **Two-step thumbnailing** + +**Requirement R14.** Provide an additional iteration to generate +metadata which is not already embedded within the content, such as +thumbnails for pictures. First, use a very fast algorithm (time beats +quality). At a later time, use a better more time-consuming algorithm. + +**Solution**. This is dependent on requirement R12. The Thumbnailer +service already supports several flavors for a thumbnail. It currently +provides a normal and large size which could fulfill this requirement by +using different algorithms for each size. + +Requirement R12 solution includes an abstract mechanism to add +additional passes. The first and second pass for thumbnail extraction +could be considered as additional passes to be configured in this +abstract mechanism. This mechanism will provide enough flexibility to +connect to different algorithms. + +Solution needs to be discussed in more detail with upstream Tracker +maintainers. + +**Status**. Satisfied. + +#### **Thumbnail resolution configuration** + +**Requirement R15.** Resolutions for thumbnail flavors normal and high +must be configurable. + +**Analysis**. Currently the resolution sizes are hard-coded in Tumbler +source code. + +**Solution**. The list of flavors for thumbnails, as well as its +resolution will be exposed through configuration files or via an API. + +**Status**. Satisfied. + +#### **Thumbnailing algorithm configuration** + +**Requirement R16.** The algorithm used for calculating the thumbnails +must be configurable. + +**Analysis**. Currently Tumbler implements several plugins for thumbnail +calculation. + +**Solution**. It is possible to add new plugins with specific algorithms +or modify existing plugins to use other algorithms. The algorithm used +for thumbnailing should be configurable. As an example, see the list of +algorithms available currently through gdk\_pixbuf\_scale() functions: + + - **Nearest**: nearest neighbor sampling. This is the fastest and + lowest quality mode. Quality is normally unacceptable when scaling + down, but may be OK when scaling up. + + - **Tiles**: this is an accurate simulation of the PostScript image + operator without any interpolation enabled. Each pixel is rendered + as a tiny parallelogram of solid color, the edges of which are + implemented with antialiasing. It resembles nearest neighbor for + enlargement, and bilinear for reduction. + + - **Bilinear**: best quality/speed balance; use this mode by default. + For enlargement, it is equivalent to point-sampling the ideal + bilinear-interpolated image. For reduction, it is equivalent to + laying down small tiles and integrating over the coverage area. + + - **Hyper**: this is the slowest and highest quality reconstruction + function. It is derived from the hyperbolic filters in Wolberg's + "Digital Image Warping". + +**Status**. Satisfied. + +### DLNA (UPnP) + +**Requirement R17**. Browsing DLNA (Digital Living Network Alliance) +media sources. + +**Analysis.** There will be a player application in the Apertis platform +to access and control DLNA media sources. This application plays the +role of Controller in DLNA spec, it would be able to browse the media +collection of remote Media Servers. This information is provided by the +Content Directory service on the Media Server. The information provided +about media content includes metadata like name, artist, date created, +size, album art, etc., as well as the protocols and data formats +supported by the server for that particular content item. + +For more specific details on these topics see the UPnP AV (Universal +Plug And Play Audio Video) architecture [documentation][upnp-architecture]. + +Metadata indexing of media content in remote Media Servers is not +required. Indexing is not desirable normally, since enough metadata is +normally provided by the Content Directory service for browsing +purposes, and local storage is scarce. Apart the amount of storage +needed could be in practice very high due to the usage of remote +sources. + +Providing the Media Server and Media Renderer roles are out of scope for +this document of the Apertis platform. + +**Solution**. Collabora proposes the GUPnP framework to fulfill the +requirements. The GUPnP library implements the UPnP specification: +resource announcement and discovery, description, control, event +notification, and presentation. On top of that, GUPnP*-*AV library is a +collection of helpers for building AV (audio/video) applications using +GUPnP. The GUPnP framework is licensed under LPGL v2.1 and it is written +in C using GObject and libsoup. GUPnP is entirely single-threaded +(though asynchronous) and integrates with the [*GLib*](http://gtk.org/) +main loop. + +**Status**. Satisfied. + +### Online Media Sources + +**Requirement R18**. Access to online media sources. + +**Analysis**. Depending on the actual media source, the specific +functionality and the API style provided will be different. For example, +Google services like YouTube and Picasa are accessed through the Google +Data Protocol. In general, most of these media sources are based on a +REST based interface. + +**Solution.** With few exceptions, like **libgdata** for Google Data +Protocol, there are not many good options in FOSS to access specific +media source online providers. However, in the worst case scenario we +could use librest and libsoup, which are described in [][Librest and Libsoup]. + +**Status.** Satisfied. + +### Bluetooth AVRCP + +**Requirement R19**. Browsing of media content from Bluetooth devices. + +**Analysis**. Bluetooth AVRCP 1.4 allows to browse media contents in the +Bluetooth device. Indexing of this contents is not required. + +**Solution**. This can be implemented by using the BlueZ API. Exact +status about AVRCP 1.4 implementation will be covered in more detail in +Connectivity design document. + +**Status**. Moved to Connectivity design. + +### **Playability check** + +**Requirement R20.** Playability check. Determine if a file is playable +or not. + +**Analysis**. We want to avoid showing the user a file which cannot be +played. It is not enough to do it through simple mime type checking, +since this might lead to false positives. Minimal check for corruption +and codecs is required. + +**Solution**. The playability has two steps: + +1\) At indexing time. During the Tracker indexing process, Tracker +Extract is able to extract information information about the mime type +and audio / video codec for a media content file. Additionally Tracker +Extract process should be able to mark the file in Tracker Store if any +corruption is found on the file during the process of metadata +extraction. + +As an example, during the process of thumbnail extraction for a video +file something similar happens, corruption or inability to decode a +frame could be found when trying to decode a specific frame to use it as +a thumbnail. This file would be marked as corrupted in Tracker Store. + +Although the last example was about a video file, this applies to other +types as well, like audio files, and in general to any file where +metadata extraction makes sense. The metadata extraction process will be +responsible to mark those files as corrupted in the case it was not +possible to extract metadata from them. + +Tracker has the flexibility to change or add new extract plugins. +Therefore, it will be possible to customize or replace the plugins with +more robust ones in case it is needed. + +2\) At browsing time. There are some checks to do for media content +files before showing to the user. Check the file is not marked as +corrupted. Check the file is from a known mime type. Check a compatible +decoder exists in the system for the codec of the audio / video file. +The list of codecs available can be obtained through the GStreamer +registry. + +There is an special case at browsing time, in the case where the +required metadata is not available yet (probably due to the reason the +file has not been processed yet). In this case, the default would be to +show the file until the metadata is retrieved. + +The solution comprehends changes in the two layers. Tracker (mostly +Tracker Extract) for the metadata retrieved at indexing time. And also +at a higher level for using the information and determine if the file is +ultimately playable or not. + +Note that the system is not 100% safe, since to guarantee that we would +have to decode all the frames. + +Additionally, applications will be able to mark specific files as +non-playable for those cases playability cannot be determined until +playback time. + +Solution needs to be discussed in more detail with upstream Tracker +maintainers. + +**Status**. Satisfied. + +## Appendix: Media Management Technologies + +This chapter is focused on describing the **current status** of the +various technologies, without really including the specific additions or +modifications discussed on the requirements, which are covered in +[][Solution]. Therefore, some of the technologies do not fully +obey the requirements yet in its current status, the modifications or +additions needed to make them work as desired are described on [][Solution]. + +### Tracker + +**[Tracker]** is a semantic data storage for desktop and mobile +devices. A semantic data storage is basically a central repository of +user information, which stores relationships between pieces of data in a +way that is re-usable among multiple applications. + +The concept is quite wide and applicable to different types of +information like pictures, messages, etc. But this document is just +focused on media content, the indexing of which is one of Tracker's +primary functions. + +This makes use of several existing technologies and standards: + + - **Resource Description Framework ([RDF])**. RDF is a + directed, labeled graph data format for representing information, + and is a W3C standard. + + - **[SPARQL]** is a W3C standard defining a query language for + databases, able to retrieve and manipulate data stored in RDF + format. + + - **[Ontologies][ontology]**. An ontology represents knowledge as a set of + concepts within a domain, and the relationships between those + concepts. It can be used to reason about the entities within that + domain and may be used to describe the domain. + + - **[Nepomuk][ontologies]** (Networked Environment for Personalized, + Ontology-based Management of Unified Knowledge). Nepomuk is a + research project, which defined a set of ontologies describing + desktop entities like files, pictures, etc. + +Tracker is a data store, an indexer and a search engine that allows the +user to find and link data easily. Tracker is typically used for +searching the local storage. By default Tracker comes with several +indexing services called "miners". Tracker is made up of several +components: + + - **Tracker Storage**. The data store and daemon to interface to + Tracker's databases. + + - **[Tracker SPARQL]**. The libtracker-sparql library is the + foundation for Tracker querying and inserting data into the data + store based on the Nepomuk ontology. + + - **[Tracker Miner]**. The libtracker-miner library is the + foundation for Tracker data miners. These miners will extract + metadata and insert it in SPARQL form into the Tracker store, + following the Nepomuk ontologies. Developers can add new miners in + order to index new data sources. + + - **[Tracker Extract]**. The libtracker-extract library is the + foundation for Tracker metadata extraction of embedded data in + files. Tracker comes with extractors written for the most common + file types (like MP3, JPEG, PNG, etc.). However, for rarer formats, + it is possible to write plugins to extract the metadata. + +Ubuntu 12.04 currently has Tracker version 0.12.10, while the Apertis +platform was using 0.10.6. During these versions many fixes have been +done as well as some enhancements and improvements, but nothing really +substantial. The performance of several components, specially the +Tracker filesystem miner has improved in the 0.12 release. The +limitations of Tracker are exposed in the context of the requirements in +[][Solution]. + +The preferences for each Tracker component can be managed through +GSettings, although there is also a UI application which is not +interesting in the scope of this project (tracker-preferences). + +#### Tracker Storage + +The Tracker storage is divided in several parts as shown in the +following illustration. + + + + - The public **libtracker-sparql** is the API layer used by the + applications to access the Tracker storage using SPARQL. Internally, + it uses the D-Bus interface when writing access to the database is + required. However, it allows a more direct access to the database + for read-only access (through libtracker-data), which reduces the + D-Bus traffic. + + - The **Tracker store daemon (tracker-store)** provides a D-Bus + interface to access the RDF storage, and it also provides also a + mechanism to notify when changes happen in the RDF storage. + + - **libtracker-data** is the library interfacing directly with SQLite + database, used by both tracker store and libtracker-sparql. + +Below, there are listed the ontologies related with media content which +are supported by Tracker: + + - Nepomuk File Ontology (nfo). + + - Nepomuk ID3 (nid3). + + - Nepomuk MultiMedia (nmm). + +See below more details about the storage needs required by Tracker: + + - **[SQLite]** **database**. The common configuration is to have + separate Tracker storage for each user. However, this can be set up + depending on the requirements of the platform, by changing + environment variable XDG\_CACHE\_HOME, as the Tracker SQLite + database is stored in $XDG\_CACHE\_HOME/tracker. Here are some rough + numbers on SQLite database space usage: + + - Empty SQLite database. The database with initialized data, but + without indexing files requires about 1.2 Mbytes. + + - Indexing Photos. As an approximate figure, our measurements show + about 800 Kbytes of database size is used for every 500 photos + (aprox. 3 Gbytes of media). Note, the size in Gbytes is just an + approximate figure, since the amount of metadata scales with + number of media items and not with their size. + + - Indexing Music. As an approximate figure, our measurements show + about 800 Kbytes of database size is used for every 300 mp3 + songs (3 Gbytes of media). + + - **Write Ahead Log ([WAL]) files**. The Tracker database is + stored in SQLite using WAL. The WAL option allows better + performance, concurrency and reliability; at a cost of consuming + extra disk space. This file is part of SQLite. which is limited to + 10,000 pages maximum, i.e. max of 10 Mbytes. Furthermore, this space + used is temporal since it will get deleted as soon as the the + database is checkpointed, which happens automatically or when the + limit is reached. There is an additional relatively small file for + shared memory, but that is transient and it does not even use disk + space, just memory. + + - **Ontologies**. The file ontologies.gvdb is stored in the same + directory as the SQLite files. It is about 350 Kbytes, created on + initialization. The size does not depend on the data indexed, but on + the ontology models. + + - **Tracker Journals**. It stores all inserts, updates and deletes. + Basically it is a file that grows without bound, a reason why it has + received some criticism. It is meant for data redundancy and backup. + The journal is also used to cope with ontology changes. It can be + disabled at compile time. In fact, it was disabled on the Nokia N9, + mainly due to the ever-growing problem and privacy. Tracker journal + can be a reasonable choice for a desktop system, but in case of + embedded devices it is better disabled. It is stored in the + $XDG\_DATA\_HOME/tracker/data directory. + +| Tracker Use Case | Media in GiB | Index in MiB | Index in % | +| ------------------------ | ------------ | ------------ | ---------- | +| Empty database | 0 GiB | 11.5 | NA | +| 500 photos or 300 songs | 3 GiB | 12.3 | 0.4 % | +| 5K photos or 3K songs | 30 GiB | 19.5 | 0.06% | +| 5K photos and 3K songs | 60 GiB | 27.5 | 0.04% | +| 83K photos and 50K songs | 1000 GiB | 277 | 0.03% | + +> Tracker use cases for storage utilization + +Note: at the time of this writing, Ubuntu 12.04 was currently using +SQLite 3.7.9 (November 2011), while the latest stable version available +is 3.7.10 (January 2012). + +Here are some configuration parameters for the Tracker Storage: + + - **Tracker DB Journal size**. Size of the journal at rotation. By + default 50 Mbytes. + + - **Tracker DB Journal rotate destination**. Where to store the + journal chunk when it hits the max size. + +#### Tracker Miner + +Tracker miners are responsible of finding content to index. Although in +the context of this document we are normally just interested in files, +it could be any resource able to be stored in Tracker. Tracker already +comes with a filesystem miner. Additionally other miners can be +implemented for specific data sources (either local or remote sources). +Here are some configuration parameters for the filesystem miner: + + - **Startup wait time**. Primarily to avoid prevent Tracker from + heavily loading the system just after boot. By default 15 seconds. + + - **Scheduler priority**. Specifies the priority of indexing + directories and files. There are three levels: when idle, first + indexing on idle (default) and anytime. + + - **Throttle**. Controls the throttle of file indexing operations. + This specifies to control the overhead indexing has on the system. + Of course, it is a trade-off between system load and speed, but it + can be tuned to make UI applications more responsive. It is a value + between 0 and 20, the higher the slower. A value of 0 denotes "as + fast as possible" (default), any other number N denotes 20/N + indexing operations per second. These limits can of course be + adjusted internally. + + - **Low disk space limit**. A configurable parameter to stop indexing + in case of low free disk space. It is configurable between 0% (no + limit) and 100%. It is 1% by default. + + - **Crawling interval**. Specifies the interval in days to check + whether the filesystem is up to date with the database. A value of + -1 specifies the check should only be done on unclean shutdowns and + -2 specifies this check should be disabled entirely. + + - **Removable days threshold**. Specifies the threshold in days after + which metadata for files from removable devices will be removed if + their filesystem is not mounted. Zero means never. Configured to 3 + days by default. + + - **File monitoring**. Option to track filesystem changes directly in + order to know what needs to be indexed. + + - **File Writeback**. Option to write information back in the files, + e.g. metadata retrieved from other sources or updated by the + application, it can be stored back in the original file. It is + limited to a few formats currently. + + - **Index Removable Devices**. Option to enable / disable the indexing + of removable devices. + + - **Index Optical Discs**. Option to enable / disable the indexing of + CDs, DVDs, and in general any optical media. + + - **List of directories to index recursively**. It can also refer to + special XDG directories like Desktop, Documents, Download, Music, + Pictures, Public, Templates and Videos. + + - **List of single directories to index** (non-recursively). Same + notes as before. + + - **List of ignored files**. Filenames can be specified with + wildcards. + + - **List of ignored directories**. Wildcards can be used to specify + them. + + - **List of ignored directories with content**. Avoid any directory + containing a file whose name is blacklisted in this list. + +The Tracker Miner Manager keeps track of available miners, their current +progress/status, and also allows basic external control of them, such as +pausing or resuming data processing. It controls the scheduling of the +different operations through the configuration parameters already +specified before. The miner only does the crawling operation for files +and sequencing the metadata extraction scheduling. The actual metadata +extraction is accomplished by Tracker Extract, described in the next +section. + +The most widely used miner is the filesystem miner, responsible for +indexing local files. Other miners exist like UPnP miner, which indexes +UPnP servers. The way to create new filesystem miners will not be shown +in this document, since there is no requirement for it in this project. + +See a general overview in the following illustration. + + + +#### **Tracker Extract** + +Tracker extract does the actual metadata extraction. It inspects the +media content and it extracts metadata information, which is stored in +Tracker Store. There is a list of the currently [Tracker supported +file formats][Tracker-formats]. It includes the main formats for all the media +content types of interest (music, music playlist, video, picture, +picture album and documents). + +**Note**: in some Tracker extract plugins like the GStreamer one, the +actual formats able to be extracted depend on the specific GStreamer +plugins installed on the system. + +The extract plugins are built as dynamic libraries which are load at +run-time. There is a text file to configure what mime types an extract +plugin understands and which library file is. There are two types of +extract plugins, specific and generic. Specific extractors are preferred +if they exist, otherwise generic ones are used (e.g. like audio/\*). + +In case more formats need to be supported, they can be easily added to +Tracker by implementing extra plug-ins. They are relatively simple to +implement; the function **tracker\_extract\_get\_metadata()** simply has +to be provided. For more details, check the example in the Tracker +Extract [documentation][Tracker Extract-doc]. + +Tracker Extract is a D-Bus daemon with a very simple interface, to get +metadata and to cancel existing tasks. Tracker Extract daemon can be +configured to automatically shutdown when idle after a certain period of +time, allowing to free resources. Also, it detects extract operations +that take too much time and aborts them. + +These are the configuration options for Tracker Extract: + + - **Scheduler priority**. Specify the priority of extracting metadata. + There are three levels: when idle, first indexing on idle (default) + and anytime. + + - **Max bytes**. Maximum number of bytes to extract for text files. + This is used just for text extraction (when full text search is + enabled), since it can make grow the index database significantly. + The default is 1 MByte, and the maximum 10 MBytes. + +#### **Tracker Scheduling** + +Tracker employs several background processes: Tracker Store, Tracker +Miner and Tracker Extract. Tracker Miner and Extract do the heavier work +in a autonomous way and they can potentially consume a lot of resources. +**Tracker Miner Manager** controls and monitors Tracker Miners, +scheduling all their operations, including crawling the filesystem and +invoking metadata extract operations. + +Tracker Miner and Extract can have their CPU scheduling priority +configured (as described before). Tracker Store daemon does not need its +CPU priority configured since it works on demand; it must always be +running and process any request by user apps or other processes. +Additionally, all Tracker daemons have IO priority set to minimum, to +interfere the least possible with other applications. + +The Tracker Filesystem Miner sets up a filesystem notifier with the +directories to index. The filesystem notifier is responsible for finding +the directories and files to index, and to monitor and notify of any +changes. Tracker Filesystem Miner has several priority queues; one per +type of operation. Tracker Miner processes items from these queues when +it becomes idle. The priority of the types of operations from highest to +lowest is: writeback operations, deleted items, created items, updated +items, moved items. + +After the operation is removed from the queue, it gets added to the task +pool while it is running. The length of the task pools is checked before +adding new operations to it to avoid overloading the system. The items +in the task pools are processed in several steps. Initially, the +information is captured without inspecting the content files, properties +like mime type, size, modification and creation time, etc. In a second +step, a request is done to Tracker Extract to extract more information +from the file. + +Thumbnails are not requested by the Tracker Miner Manager. But if a file +with an existing thumbnail gets moved or deleted, the thumbnail will be +updated too (so the thumbnail filename will get renamed or deleted too). + +### **Thumbnail Management** + +The **[Thumbnail Managing Standard]** deals with the permanent +storage of previews for file content. The **[Thumbnail Management D-Bus +specification][thumb-dbus-spec]** is a standardized D-Bus API to deal with +thumbnailing. This D-Bus specification is currently implemented by +Tumbler, which has been already used successfully in consumer products +like the Nokia N9 phone. With a D-Bus specification for thumbnail +management, applications don't have to implement thumbnail management +themselves. If a thumbnailer is available they can delegate thumbnail +work to a specialized service. The service then calls back when it has +finished generating the thumbnail. + +Thumbnailing is an expensive operation. Therefore, it is meant to be +requested by applications on-demand, i.e. If the application needs a +thumbnail for a file it should request explicitly for it to the +Thumbnailer service. + +Some features provided by the Thumbnailing service that can be +interesting in our context: + + - Provide the ability to handle different thumbnail flavors (sizes). + By default two flavors exist: + + 1. Normal configured by default as 128x128. + + 2. Large configured by default as 256x256. + + - Possibility to implement thumbnailers for closed formats or with + customized features. + + - Complexity of a LIFO queue and setting I/O and scheduling + priorities for background thumbnailing is no longer the + responsibility of the application developer. + + - Extensibility with plug-ins. This is useful to support for + additional file types or when different interpolation algorithms + are required. + +There are several components in the Thumbnailer service: + + - **Thumbnailer**. Calculates the thumbnail for a specific file + format. + + - **Thumbnailer Manager**. A register of available Thumbnailers is + available at runtime. + + - **Thumbnail Cache**. This avoids regeneration of thumbnails when + files are copied or moved and cleans up the cache sporadically and + when a file is deleted. This is managed automatically by Tracker + Filesystem Miner. + +The thumbnails are stored in +*$XDG\_CACHE\_HOME/thumbnails/\[SIZE\]/(md5sum of original URI).png*. +Thumbnails for files on removable devices may instead be stored in a +*shared thumbnail repository* on the removable device, as +*.sh\_thumbnails/\[SIZE\]/(md5sum of original filename not including +path).png*, relative to the file. See §10 of the Thumbnail Managing +Standard. + +One of the advantages of Tumbler is that the scheduler is abstracted, +there are two options implemented: a background scheduler using a +first-in-first-out (FIFO) queue and a foreground one using a +last-in-first-out (LIFO) queue. Tumbler has been used successfully in +several environments including XFCE, Maemo and MeeGo. GNOME uses +GnomeThumbnail API to generate thumbnails. EFL is using ethumb. Although +there are not many differences between the different Thumbnailing +services, Tumbler is one of the most advanced since it is a real service +and not a library, and it provides scheduling features. Additionally, +Tumbler comes packaged for popular distributions like Ubuntu and Fedora, +and it has the extra advantage of being already integrated with Tracker, +as we saw in previous section. + +Tumbler can be extended to support new thumbnails types as needed with +plugins. There are already existing plugins for GStreamer, JPEG, font, a +large collection of image formats (GDK pixbuf), PDFs (libpoppler), etc. + +> See [here][tumbler-plugins-git], to discover the mime types +> they currently support you need to navigate to their +> `provider_get_thumbnailers` implementation, for example +> `gst_thumbnailer_provider_get_thumbnailers` in +> <http://git.xfce.org/xfce/tumbler/plain/plugins/gst-thumbnailer/gst-thumbnailer-provider.c> + +Keep in mind that if a given format is not supported by Tumbler, +support can be added through its plugin API. + +Video thumbnails can be generated using the GStreamer thumbnailing +plugin. This plugin already provides an heuristic method to extract the +thumbnail from a video stream, by selecting a frame with a wide +distribution of colors (to avoid presenting a title screen or other +essentially-blank frame). + +It is interesting to keep a look on the disk space utilization for +thumbnails. After doing some measures, we found out that thumbnails +occupy 13 kilobytes for 128x128 pixel size, and about 29 kilobytes for +256x256 size. + +| Thumbnail Use Case | Media in Gb | Thumbnail size in Mb normal + large = total | Usage in % | +| ------------------ | ----------- | ------------------------------------------- | ----------- | +| 500 photos | 3 Gb | 6.3 + 13.7 = 20 | 0.65 % | +| 5K photos | 30 Gb | 63.5 + 141.6 = 205.1 | 0.67 % | +| 166K photos | 1000 Gb | 2107.4 + 4701.2 = 6808.4 | 0.66 % | + +> Thumbnail storage utilization + +#### Media Art Storage + +**[Media Art Storage]** provides a mechanism for applications to +store and retrieve artwork associated with media content, like music +from an album, the logo for a radio station, or a graphic representing a +podcast. The storage medium for artwork is the filesystem inside a +user's home directory or in $XDG\_CACHE\_HOME/media-art/. Tracker +manages and requests media art for the albums and artists. + +In some situations it is desirable to have a *local media art +repository* (for example, for read-only media or for USB removable +devices). The location for local media art will be a subdirectory named +.mediaartlocal/ within the same directory as the album's files. + +Tracker already checks for media art present in the indexed folders. +Additionally it is able to request the downloading of album art to the +album art provider installed in the system. There is already a FOSS +album art provider example using Google Images, but it can be replaced +by other implementations extracting album art from other sources just by +implementing a D-Bus service with the interface +com.nokia.albumart.Requester. + +Thumbnails of media art follow the Thumbnail Specification. The URI used +to determine the thumbnail path is the full URI pointing to the original +media art. For the path to the thumbnail refer to the Thumbnail +Specification itself. A media art fetcher is allowed to store the normal +and large thumbnails immediately after download of the media art is +completed. A media art fetcher is, however, not required to do this by +itself (the thumbnail infrastructure will or should take care of this if +the media art is not thumbnailed yet). + +### Grilo + +**[Grilo]** is a simple API for browsing and searching media content +from various sources using a single API. Applications will be able to +browse and discover media content by using the Grilo API. This API will +provide media content and its metadata, and GStreamer framework will be +able to play video or audio content (either local or remote). + +A single, high-level API that abstracts the differences among various +media content providers, allowing application developers to integrate +content from various services and sources easily. Grilo comes with a +collection of plugins for accessing various media providers, like Vimeo, +Flickr, YouTube etc. so they can be presented uniformly via the Grilo +API. Additionally a grilo-tracker plugin exists, which uses the Tracker +service (described in past sections), to make media indexed by Tracker +available through the Grilo API. + +There is an additional Grilo plugin for accessing the filesystem +directly (grl-filesystem), which checks for media content in a set of +configured directories. The defaults are the XDG user directories for +pictures, music and videos. + +Although Grilo can be used to access many media content sources, we +suggest only using it for accessing local media content. The next +sections will dig into Grilo's details and its advantages. The main +advantages of using Grilo instead of Tracker directly for this specific +use case: + + - Tracker is a semantic data storage, which can be used to store other + bits of information apart of indexing information from media content + like messages, calendars, etc. In other words, it is a very general + framework usable for many purposes. Therefore, it makes sense to + provide a higher level specialized API for media browsing (Grilo) on + top of Tracker to hide its complexity from media applications. + + - Grilo has some plugins that might be useful to extract additional + metadata, e.g. album art from last.fm. Grilo is specially + recommended for accessing to metadata from the Internet, which is + not meant to be indexed. In addition, the platform could take + advantage of future plug-ins which are planned to be developed by + the FOSS community like lyrics, moviedb.org, etc. + + - Grilo would support using an indexer other than Tracker if a better + one becomes available. More importantly, applications wouldn't have + to be modified to take advantage of such a change. + +See the following illustration for an overview of the Grilo Architecture. +Note the boxes with grey background are not going to be used in the +context of the Apertis project. + + + +#### Grilo Media Source Plugins + +The plugin must create at least one GrlMediaSource instance, and +register it in the Grilo registry. A GrlMediaSource represents a +particular source of media. These plugins provide several functions: + + - **Search** content by keywords. + + - **Browse** the media content in a hierarchical way. It is similar + to exploring a filesystem, entering into folders (GrlMediaBox) and + browsing files in it. + + - **Query** allows access to content using service specific + language. Normally it provides additional filtering capabilities. + This is used by applications to support plugin-specific + functionality. + + - **Metadata** used to request additional metadata. + + - **Store** (optional), supports to push content to the source. + + - **Remove** (optional), to remove stored contents from the source. + + - **Supported keys** provides information on which metadata keys are + provided by the plugin. Typical metadata keys are: id, title, url, + thumbnail, mime, artist, duration. + + - **Slow keys** (optional) provides info on which metadata keys are + expensive to gather. So the applications could just ask for + non-expensive ones normally, and only require the slow keys when + details are required for a particular media content. + + - **Media from URI**. Gets GrlMedia from a URI. For example a file + browser may use this to get metadata for a specific file. + + - **Test Media from URI** (optional). To check if the plugin can + convert a URI into a GrlMedia object. + + - **Notifications** on changes on media content. + +At least one of the content retrieval methods is expected to be +implemented: search, browse or query. Each media content result of the +search/browse/query is represented by a GrlMedia object. + +Plugins should be implemented in a non-blocking way to have a smooth +user experience in applications. Also threads are not recommended; +splitting work into chunks using the idle loop is encouraged. + +There is a standard set of metadata keys defined, but plugins can define +their own custom metadata keys. + +A GrlMedia can have multi-valued properties; for example a YouTube video +with different resolutions (and thus, different URIs). It is also +possible to associate different properties with each URI of a GrlMedia. + +#### Grilo Metadata plugins + +Grilo metadata source plugins do not provide access to media content, +but additional metadata information. An example would be to provide +thumbnail information for local audio content from an online service. + +This plugin must create at least one GrlMetadataSource instance, and +register it in the Grilo registry. The plugin provides several +functions: + + - **Resolve** retrieves additional information for a GrlMedia + object. + + - **May resolve:** to check if Resolve may be performed with + existing information. + + - **Set metadata** (optional): set the play count or the last time a + media was played. + + - **Writable keys** (optional): reports which keys can be stored. + + - **Supported keys:** provides information on which metadata keys + are provided by the plugin. + + - **Slow keys** (optional): provides info on which metadata keys are + expensive to gather. So the applications can ask for inexpensive + keys normally, and only request the slow keys when details are + required for a particular media content. + + - **Cancel operations:** cancels ongoing operations. + +### Google Data Protocol + +**YouTube,** as well as other Google services like Picasa, use the +**[Google Data Protocol][gdata]**. The Google Data Protocol is a +REST-inspired technology for reading, writing, and modifying information +on the web. The protocol currently supports two primary modes of access: +AtomPub and JSON. The JSON is a mapping of Atom items to JSON objects +meant to be used for web applications written in JavaScript. + +The AtomPub mode is based on the Atom Publishing protocol, with +namespaced **XML** additions. Communication between the client and +server is broadly achieved through **HTTP** requests with query +parameters, and Atom feeds being returned with result entries. Each +*service* has its own namespaced additions to the GData protocol; for +example, the Google Calendar's API has specializations for addresses and +time periods. + +Collabora proposes **[libgdata]**, which is a library to allow +access to web services using the Google Data Protocol from traditional +applications. Results are always returned in the form of result *feeds*, +containing multiple *entries*. How the entries are interpreted depends +on what was queried from the service, but when using libgdata, this is +all taken care of transparently. The main dependencies of libgdata are +libsoup, libxml and liboauth. + +Other frameworks and applications are already using libgdata with +success, e.g. evolution-data-server, Totem's YouTube plugin and Grilo's +YouTube plugin. + +The library libgdata already provides an implementation for the +**[GDataYouTubeService]**, which provides the +following functionality: + + - Query videos. + + - Query videos related to a specific video. + + - Query standard feed types: top rated, top favorites, most viewed, + most popular, most recent, most discussed, most linked, most + responded, recently featured and watch on mobile. + + - Upload a video. + + - Get categories. + +### Librest and libsoup + +It is difficult to find libraries to access online media sources if they +are not provided by the vendors themselves. However, most of these +online media sources are based on HTTP protocol with [REST] +interfaces. Therefore, in general, [**librest**] and/or +**[libsoup]** will be useful. **Librest** is a library designed to +make it easier to access web services that are designed in a "RESTful" +manner. **Libsoup** is an HTTP client/server library for GNOME. It uses +GObjects and the glib main loop, to integrate well with GNOME +applications. Collabora can suggest or provide advise for open-source +ways for accessing these services on request. This is the most effective +way to access all the features. + +### Playlists support + +Playlists are supported in Tracker. There is an specific Tracker Extract +plugins to handle playlists, which is using internally the **[Totem +Playlist Parser]** library, which is conveniently abstracted and +independent of Totem. Tracker Extract introduces the metadata retrieved +in Tracker Store using the class nmm:Playlist, which is a subclass of +nfo:MediaList. The entries in the playlist are introduced as +nfo:MediaFileListEntry. + +The supported playlist formats in Totem Playlist Parser are: +audio/x-mpegurl, totem-plparser, audio/x-scpls, audio/x-pn-realaudio, +application/ram, application/vnd.ms-wpl, application/smil and +audio/x-ms-asx. + +Grilo does not support playlists in the latest stable version available, +so this feature would need to be added as specified in the requirements +section. + +## Appendix: Questions & Answers + +These chapter contains very specific questions that have been asked +during +workshops. + +##### Q: Will asking for a specific prioritization during metadata extraction increase the load by running multiple indexing jobs ? + +A: No, the Tracker scheduler will manage all metadata indexing +operations in internal queues, so prioritization will just change the +sorting of the metadata indexing operations, but not the overall system +load. Note the scheduling system proposed in this document is not +implemented in Tracker yet. See [][Indexing scheduling] and +[][Tracker scheduling] for more details on prioritization and Tracker +scheduling. + +##### Q: How does the system know when to renew thumbnails ? + +A: When a thumbnail is generated, some properties are stored inside it +like the original URI and the modification time of the original file. If +the original file is modified at some point, its modification time will +get changed automatically by the Linux filesystem. So, it is possible to +know when a thumbnail is outdated. Additionally, Tracker is monitoring +the filesystem for changes. In case a file is modified, added, moved or +deleted its thumbnail will be automatically updated. Note: this feature +is not fully implemented yet, but it is part of the modifications +Collabora will implement. + +##### Q: How the mime type of the files is determined ? + +A: This is done through glib, which finds out the mime type in a +efficient way and it is used extensively by all GNOME based software. +The details of the algorithm used can be seen in the [Shared MIME Info +Specification][shared-mime-spec], it has been designed to be robust and efficient. +The first thing done is to test the filename extension to see if it is a +recognized type. If this operation cannot be done or the result is +uncertain, a second check will be done using the first bytes of the file +checking for the signature of known files. For more details see +g\_file\_query\_info, G\_FILE\_ATTRIBUTE\_STANDARD\_CONTENT\_TYPE and +g\_file\_info\_get\_content\_type in GNOME +documentation. + +##### Q: How the video thumbnailing works to avoid black video frames or uninteresting frames in general ? + +A: From [][Thumbnail management]: "Video thumbnails can be +generated using the GStreamer thumbnailing plugin. This plugin already +provides an heuristic method to extract the thumbnail from a video +stream, by selecting a frame with a wide distribution of colors (to +avoid presenting a title screen or other essentially-blank frame). Other +ways could be implemented if required, just by implementing a thumbnail +plugin. + +##### Q: How document thumbnailing works to avoid thumbnails of blank pages ? + +A: The existing Tumbler plugins used to extract thumbnails from +Open/LibreOffice, PDF and Microsoft Office documents gets the thumbnail +stored inside the file. It is responsibility of the office applications +to write a proper thumbnail. Typically it is just the thumbnail of the +first page of the document, which usually is the best option since the +first page contains the title in bigger font sizes, cover of the +document and logos. Any other approach is debatable, so Collabora does +not recommend to make thumbnails from only text pages since they are +less likely to be useful, thumbnailing normal text would become +unreadable. + +##### Q: How the applications can store and retrieve the last time a media file was played ? + +A: This functionality can be provided by the Grilo metadata store +plugin. The application must query the last values and set new values +through Grilo API. The media file is identified via the file URI. The +metadata store plugin stores these values in a Tracker database. It +currently supports the following values: last position where media item +was played (GRL\_METADATA\_KEY\_LAST\_POSITION), number of times a media +item has been played (GRL\_METADATA\_KEY\_PLAY\_COUNT) and last date a +media item was played (GRL\_METADATA\_KEY\_LAST\_PLAYED). Grilo is +making use of the properties already defined on the Tracker ontologies +like nfo:lastPlayedPosition, nie:usageCounter and nie:contentAccessed. A +benefit of using Grilo is that Tracker details are not exposed to the +applications, for example alternatively Grilo has another plugin to +store these fields in a separate SQLite database in case Tracker was not +used, but the API to set and get these properties would be the same. + +##### Q: How a thumbnail is retrieved ? + +A: Thumbnails can be retrieved through different ways depending on what +specific APIs the application is using. The best way for media +applications would be through the Grilo API, see +grl\_media\_get\_thumbnail and grl\_media\_get\_thumbnail\_binary\_nth +(in case several thumbnails are available for a media item). Grilo API +is internally using glib library to retrieve this through +g\_file\_query\_info, G\_FILE\_ATTRIBUTE\_THUMBNAIL\_PATH and +g\_file\_info\_get\_attribute\_byte\_string. Grilo API will need to be +modified in case more thumbnails need to be stored on the USB flash +devices. + +##### Q: How the system behaves on robustness on power loss ? + +A: This and other questions on system robustness will be answered on a +separate document focused on system robustness. Anyway, please see +chapter [][Indexing database on removable device] for an advance of +some issues regarding USB flash devices. + +##### Q: How a media file from a USB Flash device is identified ? + +A: It is identified by its complete URI, e.g. /media/D8C0-024E/Joaquin +Sabina/Joaquin Sabina & Fito Paez - Llueve sobre mojado.mp3". In some +systems, USB flash devices are mounted on a directory with a hex +identifier (depending on system configuration). This identifier is the +UUID (Universally Unique Identifiers), not the label of the USB flash +device. It is generated when the filesystem is created, and it is very. +Generally it is a 128 bit identifier, but some filesystems like VFAT +have smaller resolution (32 bits). + +##### Q: Is it configurable the timeout for Tracker extract operations ? + +A: No, they are not currently, but it would be simple to make them +configurable for example through GSettings. There are two timeouts. A +watchdog timeout which is checks that the tracker extract process does +not hang during metadata extraction (by default set to 20 seconds). +There is an additional idle timeout, which stops a tracker extract +process if it has been idle for some time (30 seconds by +default). + +##### Q: Does Tracker retry in case Tracker Extract fails due to the watchdog timer ? + +A: By default, Tracker retries up to two times if a tracker extract +process fails. It will also retry in case the file is modified or the +USB flash where it is located is reinserted. + +##### Q: Does Tracker store marks for the corrupted files ? + +A: Currently, there is no property to identify corrupted files in +Tracker. A file whose extract process has failed due to corruption in +the file, it would just have properties from the nfo ontology (nepomuk +file object), but it would not have properties from other subclasses +like nmm (nepomuk +multimedia). + +##### Q: There are reports of performance of page queries on Tracker databases is negatively affected by the number of rows in the database. Collabora to double check. + +A: Some tests running SPARQL queries have been done with databases near +6000 items and the mentioned problem was not reproducible (no +performance problems found). Please provide data set and application +code reproducing this problem for further investigation. + +##### Q: Should Tracker be used for Radio Stations information ? + +A: Tracker has already ontologies to store radio station information. +So, it would be possible to use it to store and retrieve the user +favorite radio stations. However, the interface to access and update +this information would be through plain SPARQL, which has a step +learning curve for developers. Additionally, the radio station +information is not shared with other applications. The only advantage of +using Tracker would be that the global search would automatically work +for radio station information, so it would not be necessary to implement +an extra global search plugin to look for this info in another database. +The final decision must take into consideration how well the existing +ontology for radio stations ([nmm:RadioStation]) is suited to +Apertis' roadmap. + +##### Q: What happens when a USB flash device is inserted in a USB port ? + +A: When the user inserts an USB flash device, there are three main +components participating in the action: + + - **Linux kernel** (including device drivers). The kernel will be able + to communicate with the device as soon as it is powered up, + initialized and announced through the USB Bus. + + - **[Udev]** is the device manager for the Linux kernel. + Primarily, it manages device nodes in /dev. It is the successor of + devfs and hotplug, which means that it handles the /dev directory + and all user space actions when adding/removing devices, including + firmware load. The Udev daemon listens to the netlink socket used by + the kernel to communicate with user space applications. The kernel + will send a bunch of data through the netlink socket when a device + is added to, or removed from a system. The Udev daemon catches all + this data and will do the rest, i.e., device node creation, module + loading etc. + + - **[UDisks]** (formerly known as DeviceKit-disks) lies on top of + udev, and it is an abstraction for enumerating disk and storage + devices and performing operations on them. It is is a replacement + for part of the functionality which used be provided by the now + deprecated HAL (Hardware Abstraction Layer). UDisks is a user daemon + with D-Bus interface which gets notifications from udev. + +See the following table for an idea of what happens when a USB flash device is +inserted. The table provides a general idea about the timings for +different operations in the system. Note, although the timings are based +on real measures, are not guaranteed since the all the software +components have not been completely built yet and timings depend on the +actual hardware used. + +| Timeline (s) | Delay (s) | Event | +| ------------ | --------- | ----- | +| 0 | - | (1) User inserts a USB flash device in the system, one which has never been indexed before. | +| 2.8 | 2.8 | (2) UDisks daemon reports a USB flash device has been inserted via D-Bus. The user application could be autostarted at this point. | +| 3.6 | 0.8 | (3) UDisks daemon notifies the partition in the USB Flash has been mounted automatically. The filesystem is accessible from now on. Tracker Filesystem Miner will start crawling the filesystem. | +| 4.9 | 1.3 | (4a) Media files in the root directory of the USB flash device are shown to the user. | +| 5.4 | 0.5 | (4b) Tracker has finished crawling the filesystem to find out all entries in the filesystem. At this point we can have counters per media type. This timing measure was taken for a full file system scan of a 7 GiB used USB flash device with 1407 files organized in multiple directories. As we can appreciate, there is a high fixed cost in (4a), while the total scan cost (4b) is not so high. | +| 6 | 0.6 | (5a) Tracker Extract has metadata for the files that have been returned in the first page shown to the user. | +| 46 | 40 | (5b) Tracker Extract finishes gathering metadata for all files in the USB flash device (7 GiB, 1407 files). This gives a throughput of approximately 34 songs extractions/s. | + +##### Q: How does the monitoring of filesystem changes work in Tracker ? + +A: The monitoring of changes in files and directories of the filesystem +is handled internally by Tracker Miner via the [GFileMonitor] API. +Note GFileMonitor is just an abstraction in glib, which abstracts the +file monitoring functionality, since there are several backends +available implementing such functionality depending on the specific +operating system. Note, this mechanism is a very efficient way to get +notified about changes on the filesystem, since it is directly provided +by the kernel, instead of doing active polling. Linux uses the +**inotify** backend. For a more detailed view of the inotify API see the +tutorial "*Monitor filesystem activity with [inotify]*". + +[upnp-architecture]: http://www.upnp.org/specs/av/UPnP-av-AVArchitecture-v1.pdf + +[Tracker]: http://projects.gnome.org/tracker/ + +[RDF]: http://www.w3.org/RDF/ + +[SPARQL]: http://www.w3.org/TR/rdf-sparql-query/ + +[ontology]: http://developer.gnome.org/ontology/0.12/ + +[ontologies]: http://www.semanticdesktop.org/ontologies/ + +[Tracker SPARQL]: http://developer.gnome.org/libtracker-sparql/0.12/ + +[Tracker Miner]: http://developer.gnome.org/libtracker-miner/0.12/ + +[Tracker Extract]: http://developer.gnome.org/libtracker-extract/0.12/ + +[SQLite]: http://www.sqlite.org/ + +[WAL]: http://www.sqlite.org/draft/wal.html + +[Tracker-formats]: https://live.gnome.org/Tracker/SupportedFormats + +[Tracker Extract-docs]: http://developer.gnome.org/libtracker-extract/0.12/libtracker-extract-How-to-use-libtracker-extract.html + +[Thumbnail Managing Standard]: http://specifications.freedesktop.org/thumbnail-spec/thumbnail-spec-latest.html + +[thumb-dbus-spec]: https://wiki.gnome.org/DraftSpecs/ThumbnailerSpec + +[Media Art Storage]: https://wiki.gnome.org/DraftSpecs/MediaArtStorageSpec + +[Grilo]: https://wiki.gnome.org/Projects/Grilo + +[gdata]: http://code.google.com/apis/gdata/ + +[libgdata]: http://developer.gnome.org/gdata/0.10/gdata-overview.html + +[GDataYouTubeService]: http://developer.gnome.org/gdata/0.10/GDataYouTubeService.html + +[REST]: http://en.wikipedia.org/wiki/Representational_state_transfer + +[librest]: https://live.gnome.org/Librest + +[libsoup]: http://developer.gnome.org/libsoup/ + +[Totem Playlist Parser]: http://developer.gnome.org/totem-pl-parser/stable/ + +[shared-mime-spec]: http://standards.freedesktop.org/shared-mime-info-spec/shared-mime-info-spec-latest.html + +[nmm:RadioStation]: http://developer.gnome.org/ontology/0.14/nmm-ontology.html + +[Udev]: http://www.kroah.com/linux/talks/ols_2003_udev_paper/Reprint-Kroah-Hartman-OLS2003.pdf + +[UDisks]: http://www.freedesktop.org/wiki/Software/udisks + +[GFileMonitor]: http://developer.gnome.org/gio/unstable/GFileMonitor.html + +[inotify]: http://www.ibm.com/developerworks/linux/library/l-ubuntu-inotify/index.html + +[tumbler-plugins-git]: http://git.xfce.org/xfce/tumbler/plain/plugins/ diff --git a/content/designs/multimedia.md b/content/designs/multimedia.md new file mode 100644 index 0000000000000000000000000000000000000000..ed48b04bc883de3878359b2ea76e011be7ff32c0 --- /dev/null +++ b/content/designs/multimedia.md @@ -0,0 +1,494 @@ +--- +title: Multimedia +short-description: Requirements for multimedia handling (general-design) +authors: + - name: Edward Hervey +--- + +# Multimedia + +## Introduction + +This document covers the various requirements for multimedia handling in +the Apertis platform. + +The FreeScale I.MX/6 platform provides several IP blocks offering +low-power and hardware-accelerated features: + + - GPU : For display and 3D transformation/processing + + - VPU : For decoding and encoding video streams + +The Apertis platform will provide robust and novel end-user features by +getting the most out of those hardware components. However, in order to +retain power efficiency, care must be taken in the way those components +are exposed to applications running on the platform. + +The proposed solutions outlined in this document have also been chosen +for the Apertis platform to re-use as many “upstream†open-source +solutions as possible, to minimize the maintenance costs for future +projects based upon Apertis. + +## Requirements + +### Hardware-accelerated media rendering + +The Apertis system will need to make usage of the underlying GPU/VPU +hardware acceleration in various situations, mainly: + + - Zero copy of data between the VPU decoding system and the GPU + display system + + - Be usable in WebKit and with the Clutter toolkit + + - Integration with FreeScale and ADIT technologies + +### Multimedia Framework + +In a system like Apertis, writing a wide array of applications and +end-user features offering multimedia capabilities requires a framework +which will offer the following features: + + - Handle a wide variety of use-cases (playback, recording, + communication, network capabilities) + + - Support multiple audio, video and container formats + + - Capability to add new features without having to modify existing + applications + + - Capability to handle hardware features with as little overhead as + possible + + - Widely adopted by a variety of libraries, applications and systems + +In addition, this system needs to be able to handle the requirements +specified in [][Hardware accelerated media rendering]. + +### Progressive download and buffered playback + +The various network streams played back by the selected technology will +need to provide buffering support based on the playback speed and the +available bandwith. + +If possible a progressive download strategy should be used, using such a +strategy the network media file is temporarily stored locally and +playback starts when it is expected the media can be played back without +a need to pause for further buffering. Or in other words, playback +starts when the remaining time to finish the download is less then the +playback time of the media. + +For live media where progressive downloading is not possible (e.g. +internet radio) a limited amount of buffering should be provided to +offset the effect of temporary jitter in the available bandwidth. + +Apart from the various buffering strategies, the usage of adapative +bitrate streaming technologies such as HLS or MPEG-DASH is recommended +if available to continuously adapt playback to the current network +conditions. + +### Distributed playback support + +The Apertis platform wishes to be able to share playback between +multiple endpoints. Any endpoint would be able to watch the same media +that another is watching with perfect synchronization. + +### Camera display on boot + +Apertis requires the capability to show camera output during boot, for +example to have rear camera view for parking quickly. Ideally, the +implementation of this feature must not affect the total boot time of +the system. + +### Video playback on boot + +Apertis requires the capability to show a video playback during boot. +This shares some points with the section [][Camera display on boot] +regarding the requirements, the implementation, and risks and concerns. +Collabora has some freedom here to restrict the fps, codecs, +resolutions, quality of the video to be playback in order to be able to +match the requirements. + +### Camera widget + +Apertis requires that a camera widget that can be embedded to +applications to easily display/manipulate camera streams is provided. +The widget should offer the following features: + + - Retrieve the list of supported camera devices and ability to change + the active device + + - Support retrieving and updating color balance (saturation, hue, + brightness, contrast), gamma correction and device capture + resolution + + - Provides an interface for image processing + + - Record videos and take pictures + +### Transcoding + +*Transcoding* can be loosely described as decoding, optionally +processing and re-encoding of media data (video, audio, …) possibly from +one container format to another. As a requirement for Apertis, +transcoding must be supported by the Multimedia Framework. + +### DVD playback + +Most DVDs are encrypted using a system called [CSS] (content +scrambling system), that is designed to prevent unauthorized machines +from playing DVDs. CSS is licensed by the DVD Copy Control Association +(DVD CCA), and a CSS license is required to use the technology, +including distributing CSS enabled DVD products. + +Apertis wishes to have a legal solution for DVD playback available on +the platform. + +### Traffic control + +Traffic control is a technique to control network traffic in order to +optimize or guarantee performance, low-latency, and/or bandwidth. This +includes deciding which packets to accept at what rate in an input +interface and determining which packets to transmit in what order at +what rate on an output interface. + +By default traffic control on Linux consists of a single queue which +collects entering packets and dequeues them as quickly as the underlying +device can accept them. + +In order to ensure that multimedia applications have enough bandwidth +for media streaming playback without interruption when possible, Apertis +requires that a mechanism for traffic control is available on the +platform. + +## Solutions + +### Multimedia Framework + +Based on the requirements, **we propose selection of the [GStreamer +multimedia framework][GStreamer]**, a LGPL-licensed framework covering all of +the required features. + +The GStreamer framework, created in 1999, is now the de-facto multimedia +framework on GNU/Linux systems. Cross-platform, it is the multimedia +backbone for a wide variety of use-cases and platforms, ranging from +voice-over-IP communication on low-power handsets to +transcoding/broadcasting server farms. + +Its modularity, through the usage of plugins, allows integrators to +re-use all the existing features (like parsers, container format +handling, network protocols, and more) and re-use their own IP (whether +software or hardware based). + +Finally, the existing eco-system of application and libraries supporting +GStreamer allows Apertis to benefit from those where needed, and benefit +from their on-going improvements. This includes the WebKit browser, and +the Clutter toolkit. + +**The new GStreamer 1.0 series will be used** for Apertis. In its 6 +years of existence, the previous 0.10 series exhibited certain +performance bottlenecks that could not be solved cleanly due to the +impossibility of breaking API/ABI compatibility. The 1.0 series takes +advantage of the opportunity to fix the bottlenecks through API/ABI +breaks, so Apertis will be in a great position to have a clean start. + +Amongst the new features the 1.0 series brings, the most important one +is related to how memory is handled between the various plugins. This is +vital to support the most efficient processing paths between plugins, +including first-class support for zero-copy data passing between +hardware decoders and display systems. + +[Several][ed-pres] [presentations][wim-pres] are available detailing in depth the changes in +the GStreamer 1.0 series. + +### Hardware-accelerated Media Rendering + +The current set of GStreamer plugins as delivered by Freescale targets +the Gstreamer 0.10 series, for usage with GStreamer 1.0 these plugins +will need to be updated. + +As freescale was not able to deliver an updated set of plugins in a +reasonable timeframe Collabora has done a initial proof of concept port +of the VPU plugins to Gstreamer 1.0 allowing ongoing development of the +middleware stack to focus purely on Gstreamer 1.0. + +Eventually it is expected that freescale will deliver an updated set of +VPU plugins for usage with Gstreamer 1.0. + +to benefit as much as possible from improvements provided by the +“upstream†GStreamer in the future, it is recommend need to ensure +that the platform-specific development is limited to features specific +to that platform. + +Therefore it is recommended for the updated VPU plugins to be based on +existing base video decoding/encoding classes (See [GstBaseVideoDecoder], +[GstBaseVideoEncoder]). This will ensure that: + + - The update plugins will benefit from any improvements done in those + base classes and future adjustments to ensure proper communication + between decoder/encoder elements and other elements (like display + and capture elements). + + - The updated plugins will benefit from commonly expected behaviors of + decoders and encoders in a wide variety of use-cases (and not just + local file playback) like QoS (Quality of Service), low-latency and + proper memory management. + +### Buffering playback in GStreamer and clutter-gst + +[ClutterGstPlayer] uses the [playbin2] GStreamer element for +multimedia content playback, which uses [queue2] element to provide +the necessary buffering for both live and on demand content. For the +Apertis release (12Q4) new API was added to clutter-gst to make it more +easier for applications to correctly control this buffer. Work is +currently in progress to upstream these changes. + +#### Progressive buffering based on expected bandwidth + +Depending on the locality it might be desirable to not only buffer based +on the currently available bandwidth, but also on the expected +bandwidth. For example the navigation system may be aware of a tunnel +coming up, where no or only very limited bandwidth is available. + +Due to the way buffering works in Gstreamer the final control for when +playback starts rests with the application, normally an application uses +the estimates for remaining download time provided by gstreamer (which +is based on the current download speed). In the case where the +application has the ability to make a more educated estimate by using +location/navigation information, it can safely ignore Gstreamers +estimate and purely base playback start on its own estimate. + +### Distributed playback + +As the basis for the distributed playback proof of concept solution +Collabora suggest the usage of the [Aurena] client/daemon +infrastructure. Aurena is a small daemon which announces itself on the +network using avahi. This daemon provides the media and control +information over http and also provide provides a Gstreamer based +network clock for to use for clients to synchronize against. + +Aurena will be integrated in the Apertis distribution an example +clutter-gst client will be provided. + +As Aurena is an active project and further work on this topic is +scheduled for the Q2 of 2014, more details will be provided on the +current state and functionality available in Aurena closer to that time. + +### Camera and Video display on boot + +In order to keep the implementation both low in complexity and flexible +a pure user-space solution is recommended, that is to say no kernel +modification or bootloader modification are done to enable this +functionality. + +The advantage of such a solution is that a lot of common userspace +functionality can be re-used by the implemention. The main disavantage +is that this functionality will only be available when userspace is +started. + +To provide a general feeling for the timings involved when running an +unoptimized darjeeling image (130312) on the I.MX6 Sabrelite board +the boot breakdown is as follows (Note that darjeeling isn't optimized +for startup time) : + + - 0.00s: Power plugged in + + - 0.26s: u-boot started + + - 1.23s: Kernel starting + + - 4.12s: LVDS screen turns on + + - 4.59s: Initramfs/mini userspace starting + + - ~6.00s: Normal userspace starting. + +> The u-boot boot delay was disable for this test, no other changes + +Even though these number should be improved by the boot optimisation +work (planned for Q2, 2013), the same order of magnitude will most +likely remain for the SabreLite hardware booting from MMC. + +As a basis building block for providing this functionality +[Plymouth] will be used. Plymouth is the de-factor application +used for showing graphical boot animations while the system is +booting, being using by Fedora, Ubuntu and many others. On most +systems Plymouth takes advantage of the modesetting DRM drivers, with +fallbacks to using the old-style dumb framebuffer or even a pure text +mode. + +Plymouth has a extensive pluggable theming system. New themes can be +written either in C or using a simple scripting language. A good +overview/introduction of the plymouth specific theme scripting can be +found in a series of [blog posts by Charley Brey][brey-blog]. + +Plymouth has the ability to use themes which consists of a series of +full-screen images or in principle even a video file, however most +boot animations are kept relatively simple and are rendered on the fly +using plymouths built-in image manipulation support. The reason for +this is simply an efficiency trade-of, while on-the-fly rendering adds +some cpu load for simpler animations that cpu load will be still lower +then loading every frame from an image file or rendering a video. +Furthermore this approach reduces the size and number of assets which +have to be loaded from storage. As such, to minimize the impact on +boot performance the use simple themes which are rendered on the fly +is recommended over the use of full-screen images or videos. + +To add support for the “camera on boot†functionality plymouth will be +extended such that it can be requested to switch to a live-feed of the +(rear-view) camera during boot-up. To be able to support a wide range +of cameras (e.g. both directly attached cameras and e.g. ip cameras) +the use of Gstreamer is recommended for this functionality. However to +ensure boot speed isn't negatively impacted Gstreamer can't be used +from the initramfs as this would significantly increase its size and +thus slowing down the boot. An alternative to using Gstreamer would be +to implement dedicated, hardware/camera specific plugins which are +small enough to be included in the initramfs. + +During Q2 of 2013 work will be done to optimise the boot time of +Apertis. At which point it will become more clear what the real impact +of delaying camera-on-boot until the start of full userspace is. + +### Camera widget and clutter-gst + +To provide the camera widget functionality a new actor was developed for +clutter-gst. As any other clutter actor, the ClutterGstCameraActor can +be embedded in any clutter application and supports all requirements +either through the usage of provided convenience APIs or using GStreamer +APIs directly. Image processing is achieved with the usage of pluggable +GStreamer elements. + +### Transcoding + +GStreamer already supports [transcoding][gentrans] of various different media +formats through the usage of custom pipelines specific to each +input/output format. + +In order to simplify the transcoding process and avoid having to deal +with several different pipelines for each supported media format, +Collabora proposes adding a new transcodebin GStreamer element which +would take care of handling the whole process automatically. This new +element would provide a stand-alone everything-in-one abstraction for +transcoding much similar to what the playbin2 element does for playback. +Applications could then take advantage of this element to easily +implement transcoding support with minimal effort. + +### DVD playback + +[Fluendo DVD Player] is a certified, commercial software designed to +reproduce DVDs on Linux/Unix and Windows platforms allowing legal DVD +playback on Linux using GStreamer. It supports a wide range of features +including, but not limited to, full DVD playback support, DVD menu and +subtitles support. + +Other open-source solutions are available, but none of them meets the +legal requirements and for that Collabora proposes the usage of Fluendo +DVD Player and to provide the integration of it on the platform. + +### Traffic control + +Traffic control and shaping comes in two forms, the control of packets +being received by the system (ingress) and the control of packets being +sent out by the system (egress). Shaping outgoing traffic is reasonably +straight-forward, as the system is in direct control of the traffic sent +out through its interfaces. Shaping incoming traffic is however much +harder as the decision on which packets to sent over the medium is +controlled by the sending side and can't be directly controlled by the +system itself. + +However for systems like Apertis control over incoming traffic is far +more important then controlling outgoing traffic. A good example +use-case is ensuring glitch-free playback of a media stream (e.g. +internet radio). In such a case, essentially, a minimal amount of +incoming bandwidth needs to be reserved for the media stream. + +For shaping (or rather influencing or policing) incoming traffic, the +only practical approach is to put a fake bottleneck in place on the +local system and rely on TCP congestion control to adjust its rate to +match the intended rate as enforced by this bottleneck. With such a +system it's possible to, for example, implement a policy where traffic +that is not important for the current media stream (background +traffic) can be limited, leaving the remaining available bandwidth for +the more critical streams . + +However, to complicate matters further, in mobile systems like Apertis +which are connected wirelessly to the internet and have a tendency to +move around it's not possible to know the total amount of available +bandwidth at any specific time as it's constantly changing. Which +means, a simple strategy of capping background traffic at a static +limit simply can't work. + +To cope with the dynamic nature a traffic control daemon will be +implemented which can dynamically update the kernel configuration to +match the current needs of the various applications and adapt to the +current network conditions. Furthermore to address the issues +mentioned above, the implementation will use the following strategy: + + - Split the traffic streams into critical traffic and background + traffic. Police the incoming traffic by limiting the bandwidth + available to background traffic with the goal of leaving enough + bandwidth available for critical streams. + + - Instead of having static configuration, let applications (e.g. a + media player) indicate when the current traffic rate is too low for + their purposes. This both means the daemon doesn't have to actively + measure the traffic rate and allows it cope with streams that don't + have a constant bitrate more naturally. + + - Allow applications to indicate which stream is critical instead to + properly support applications using the network for different types + of functionality (e.g. a webbrowser). This rules out the usage of + cgroups which only allows for per-process level granularity. + +Communication between the traffic control daemon and the applications +will be done via D-Bus. The D-Bus interface will allow applications to +register critical streams by passing the standard 5-tuple (source ip and +port, destination ip and port and protocol) which uniquely identify a +stream and indicate when a particular stream bandwidth is too low. + +To allow the daemon to effectively control the incoming traffic, a +so-called Intermediate Functional Block device is used to provide a +virtual network device to provide an artificial bottleneck. This is done +by transparently redirecting the incoming traffic from the physical +network device through the virtual network device and shape the traffic +as it leaves the virtual device again. The reason for the traffic +redirection is to allow the usage of the kernels egress traffic control +to effectively be used on incoming traffic. The results in the example +setup shown below (with eth0 being a physical interface and ifb0 the +accompanying virtual interface). + + + +To demonstrate the functionality as describe above a simple +demonstration media application using Gstreamer will be written that +communicates with the Traffic control daemon in the manner described. +Furthermore some a testcase will be provided to emulate changing network +conditions. + +[CSS]: http://www.dvdcca.org/css.aspx + +[GStreamer]: http://gstreamer.freedesktop.org/ + +[ed-pres]: http://video.linux.com/videos/gstreamer-10-no-longer-compromise-flexibility-for-performance + +[wim-pres]: http://gstconf.ubicast.tv/videos/keynote-gstreamer10/ + +[GstBaseVideoDecoder]: http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gst-plugins-bad-libs/html/gst-plugins-bad-libs-GstBaseVideoDecoder.html + +[GstBaseVideoEncoder]: http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gst-plugins-bad-libs/html/gst-plugins-bad-libs-GstBaseVideoEncoder.html + +[ClutterGstPlayer]: http://developer.gnome.org/clutter-gst/stable/ClutterGstPlayer.html + +[playbin2]: http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gst-plugins-base-plugins/html/gst-plugins-base-plugins-playbin2.html + +[queue2]: http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer-plugins/html/gstreamer-plugins-queue2.html + +[Aurena]: https://github.com/thaytan/aurena + +[Plymouth]: http://www.freedesktop.org/wiki/Software/Plymouth + +[brey-blog]: http://brej.org/blog/?cat + +[gentrans]: http://gentrans.sourceforge.net/docs/head/manual/html/howto.html#sect-introduction + +[Fluendo DVD Player]: http://www.fluendo.com/shop/product/fluendo-dvd-player/ diff --git a/content/designs/multiuser-transactional-switching.md b/content/designs/multiuser-transactional-switching.md new file mode 100644 index 0000000000000000000000000000000000000000..306313648ac72e054003fdb99c9e64d5ac314a70 --- /dev/null +++ b/content/designs/multiuser-transactional-switching.md @@ -0,0 +1,929 @@ +--- +title: Multiuser transactional switching +short-description: Technical analysis and recommendations + (general-design) +authors: + - name: Simon McVittie + - name: Gustavo Noronha + - name: Tomeu Vizoso +--- + +# Multiuser transactional switching + +## Introduction + +This document describes one particular set of use-cases for how multiple +users are expected to use the Apertis system, using the [Multiuser Design +document](multiuser.md) +as a base. It starts by describing the use cases that are +believed to be important in the automotive context, followed by a +technical analysis and recommendations. + +The specific set of use cases on which this document focuses is a +“transactional†temporary switch between users, which is a relatively +unusual situation in mainstream computing, but has been identified as a +situation that is more likely to arise in an automotive environment. + +In order to balance the various requirements and priorities that might +be present in OEM variants of Apertis, it is useful to consider +trade-offs such as not allowing user switching in runtime, if +implementing an ideal user experience for this feature would be too +onerous or only possible with a sub-par experience. The amount of +customization allowed would then be reduced to account for this design +restriction, as discussed in [][Limiting customizability as a trade-off] + +### Terminology + +Please see the Multiuser Design document1 for the definitions used in +this document for jargon terms such as *user*, *user ID/uid*, *trusted*, +*system service*, *user service*, *multi-seat* and *fast user +switching*. + +## Requirements + +See the Multiuser Design document for general requirements applicable to +all aspects of the multi-user design in Apertis. This document focuses +on one set of use cases which has been identified as requiring detailed +design: a “transactional†switch between users. + +The driver is the primary user of the car, and hence the car's +infotainment interface; but because the driver must be able to focus on +driving, it is desirable that the front-seat passenger can “take over†a +shared screen (for instance, in the typical design that places a +touchscreen between the driver and front passenger) so that they can +carry out a task on the driver's behalf (for instance, programming a +navigation destination or finding a required piece of information). + +The Apertis user interface is anticipated to be customizable, and the +passenger's preferences do not necessarily match the driver's. As a +result, it is desirable that the passenger can temporarily switch to a +set of preferences with which they are more familiar. + +Depending on the specific use-case, it might be necessary for the +passenger to access their own private data (as opposed to the driver's +private data). + +When switching users, it must be possible for open applications to +remain open. Some use-cases benefit from this, and some do not. + +Some of the requirements from the Multiuser Design document are +particularly relevant to this design, and are re-stated here: + + - Switching users shall be performed with a smooth transition, with no + visual flickering. + + - User switching should not take more than 5 seconds. + +### Assumptions + +Some of the requirements in the Multiuser Design document are stated in +terms of a class of possible sets of requirements, among which a +concrete design must make a choice. In this document we have assumed the +following requirements. + + - User data is private to each user: + + - Settings + + - Address book + + - Browser history + + - Application icons + + - Arrangement of icons in the app launcher + + - Account data for web services + + - Playlists + + - The following will be shared, if that makes the design simpler: + + - Applications (from the store) + + - Media library (music, videos) + + - Paired Bluetooth devices + + - Removable devices are accessible to all users and all users can + unmount/eject them + +The idea is that application binaries, libraries and other supporting +data, as well as media files, will be shared, but each user will have +their own view of those. That means for instance that when an +application is installed by a user for the first time its icon would +appear only on the current user's launcher. When other users install the +application no download would be necessary – it would just be a matter +of making the icon appear on that user's launcher. + +#### Out of scope for this document + +Some configurations are outside the scope of this particular proposal. +They could be supported by a different concrete design within the +general framework described by the Multiuser Design document. + + - Multiple concurrent users are out-of-scope: to provide desired + performance on optimized hardware, each user's applications will not + in general remain active when another user is logged in. Instead, + the previous user's processes will be instructed to save their state + and exit so that they can be resumed later. An implementation may + have both sessions run in parallel for a short time if necessary, in + order to facilitate a smooth transfer, but this is intended to be + merely a transitional state. + + - Multi-seat (as defined by the Multiuser document) is out-of-scope: + for the same reasons, if there are multiple screens, they will all + be associated with the same user. In this document we do not aim to + support separate concurrent logins on different screens (e.g. + separate sessions for rear-seat passengers). + +### Use-case scenarios + +This design includes all of the use-cases described in the Multiuser +Design document, including those that require user switching and +optional privacy between users. + +When we approach the implementation stage for this design, it would +benefit from input from a UX designer, with two main aims: first, +confirm that the use cases and their suggested workflows make sense; and +second, for use cases that benefit from “hinting†the user towards +particular actions, recommend ways in which this can be done. + +#### Passenger acts on behalf of driver; access to driver's private data + +Driver Diana and passenger Peter are on the way to visit Peter's friend +Fred. Diana asks Peter to check Fred's exact address. Fred has shared +the address on Facebook, in a post that is visible to Diana but not to +Peter, or to both Diana and Peter. + +##### Trivial case + +Assuming that the driver's display is situated +between the driver and the front passenger as is conventional, Peter can +use the shared display that is currently “logged in†as Diana. The +system has no way to distinguish between input from Peter and input from +Diana. + +*Comments*: This use case is trivial to implement – indeed, it would be +difficult to avoid implementing it – and it is equivalent to the +behaviour of a single-user system. It is only mentioned here for +comparison with the more complex use-cases below, where the system needs +to be aware that the person using it has changed. + +#### Switching for a transaction: access to non-driver's private data + +Driver Diana and passenger Peter are on the way to visit Peter's friend +Fred. Diana asks Peter to check Fred's exact address. Fred has shared +the address on Facebook, in a post that is visible to Peter but not to +Diana. + +Diana is the current user of the Apertis system. However, accessing this +information requires Peter's private data (in this case, Facebook +credentials). + +##### Switching for a transaction + +Peter selects a menu option labelled “Switch user...†or similar, chooses his own name from a list of users, +and authenticates in some way if required. This switches the current +user of the Apertis system from Diana to Peter, so that he can view his +Facebook page and find Fred's address. For the purposes of this +particular use case, the initial state of Peter's session is not +significant (but see subsequent scenarios for situations where it does +matter). + +##### Transferring selected data + +Peter should be able to select Fred's address and set it as the satnav destination, without leaving Diana able +to access his Facebook account in future. + +##### Obviousness of current user context + +If Diana and Peter have selected different user interface themes, it should be obvious on whose +behalf the Apertis system is acting: it should use Peter's theme if and +only if it is working with Peter's data. + +##### Core functionality not interrupted + +Certain core functions in the infotainment domain should not be interrupted by the user switch. For +instance, if Diana was listening to locally stored media or to the +radio, or using satnav to navigate to the city where Fred lives, this +should not be interrupted. In particular, it must be possible for a +navigation-related notification (such as an imminent turning or a speed +limit change) to appear during the animated transition from Diana to +Peter. + +##### Driver's settings retained for core functionality + +It is important that the driver is not distracted. While Peter is using the +Apertis system, certain core functions should remain linked to the +driver's user preferences, and should take precedence over what Peter is +doing. For instance, if navigation is in the infotainment domain, it +should continue to use Diana's preferences to determine how far in +advance to warn Diana about a turning. + +##### Alternative model, not recommended + +An alternative model that could +be used for this transactional switching would be to use Peter's user +interface preferences (theme, etc.), but with all applications still +running as Diana, so that they have access to Diana's private data but +not to Peter's. However, this model would not satisfy point **a** of +this particular use case, because Diana's browser is either not logged +in to Facebook, or logged in as Diana; and it is undesirable to require +Peter to enter his Facebook password into Diana's browser. It also does +not satisfy point **c**: we feel that using Peter's UI theme for Diana's +browser would mislead Peter into believing that this browser is running +on his behalf, not Diana's. + +#### Cancelling the transaction + +Assume that the preconditions and events of use case [][Switching for a transaction: access to non-driver's private data] have +occurred. While trying to find Fred's address, Peter is distracted +(perhaps by a call on a phone not connected with the car) and does not +continue to interact with the HMI. + +##### Driver regains control of Apertis system + +Because some functions of the Apertis system are driver-focused, it must be easy for Diana to +revert to her preferred configuration. If Peter's use of the situation +is viewed as a “transactionâ€, then Diana reclaiming the system can be +viewed as “aborting†or “rolling back†the transaction. + +This could occur either via a timer (when Peter stops interacting with +the HMI for some arbitrary length of time, control returns to Diana) or +via explicit action from either Peter or Diana (a menu option or +touchscreen gesture). + +##### Diana's “last-used†state is restored + +The foreground application, the set of background applications, and all of their states +should be identical to how they were at the beginning of [][Switching for a transaction: access to non-driver's private data]. +It is as if the “transaction†had never happened. + +*Comments:* Returning to the last-used state is important for a variant +of this use-case: if Diana accidentally initiates user-switching, then +cancels the action, this should not result in state being lost. + +Automatically switching via a timer could lead to undesired results, and +should be deployed with care: for instance, if Peter has left a photo of +Fred's house displayed on the screen to help Diana to identify where to +park, but has not explicitly used some “send to user...†action to +“complete the transaction†by explicitly sending that content back to +Diana, then it is undesirable for the system to switch back to Diana's +context if that would mean not displaying that photo. + +As a result, we recommend that user-switching should be via an explicit +action, not via a timer. One possible compromise would be for a timer to +trigger a notification that effectively asks “are you still there?â€, +offering actions “switch back to Diana†and “stay as Peterâ€. + +#### Switching user, maintaining state – web + +Driver Diana starts to look for information on a web page, then asks the +passenger Peter to take over so that she can concentrate on driving. +Peter wishes to authenticate as himself (as in [][Switching for a transaction: access to non-driver's private data]) so that +he can use his own display preferences, bookmarks, etc. + +##### State transfer + +Peter selects a menu option in Diana's web +browser labelled “Send to...†or similar, or uses a touchscreen gesture +with the same effect. He chooses his own name from a list of users, and +authenticates as himself. After Peter authenticates, the browser remains +open in Peter's session, and it displays the same web page that Diana +was looking at. + +*Comments*: The user interface design for this requires some care to set +up the appropriate privacy expectations: if the action was phrased more +like “switch user†rather than “send toâ€, this would risk users +unintentionally sharing private state, leading to a loss of confidence +in the system. + +##### Transfer back + +Peter finds the desired information and selects +the “Send to...†option again. The browser remains visible in Diana's +session, displaying the same web page that Peter was looking at. + +*Comments:* This use-case has privacy concerns due to the unclear +security model that has evolved over time for the Web, and must be +handled carefully. To fulfill the use case, the state that is +transferred must include the web page's URL and/or its content. In +either case this can lead to a poor UX or a security vulnerability if +mishandled, even taking into account that Peter can already see the +contents of Diana's screen: + + - If the state transfer is done by URL, suppose Diana is currently + looking at a page for which Peter does not have the necessary + credentials, for instance a private Google+ post from someone who is + not Peter's friend. In this case, the first thing Peter will see is + a “permission denied†message, which is not a friendly user + experience. + + - If the state transfer is done by URL, suppose Diana is currently + looking at a page whose URL is itself sensitive, for instance a + Google Docs “shareable URL†that contains its own authentication + token. In this case, by retrieving the URL from browser history, + Peter now has perpetual access to edit that document, which was not + intended by Diana and could be characterized as a security flaw. + This could be mitigated by careful user interface design, for + instance choosing a verb with implications of “send†or “shareâ€. + + - If the state transfer is done by content, suppose Diana is currently + looking at a page whose hidden content is sensitive, for instance + one that contains an authentication token to act on Diana's behalf + in an embedded form. In this case, by retrieving the content from + browser cache, Peter now has access to that authentication token, + which once again was not intended by Diana. Again, this could be + mitigated by careful UI design. + + - If the state is transferred back to Diana (point **b**), there is an + equivalent of each of those issues, with the roles reversed. + +As a result of the issues described, Collabora recommends being careful +to set privacy expectations via UI design. + +##### Alternative model + +The alternative model described in [][Alternative model] would +avoid any privacy concerns, but does inherit the same issues as in +and is not recommended. + +#### Switching user, maintaining state – music + +In a situation similar to the scenarios above, driver Diana starts to +look for a particular song in the media library, then asks passenger +Peter to take over so that she can concentrate on driving. Assume that +Peter knows the desired song is in one of his playlists. + +##### Peter's playlists are available + +Peter should be able to use his +own playlists to find the song. There are two ways this could work, +depending whether playlists are considered to be private or merely +user-specific (see the Requirements section of the Multiuser Design +document). + +If playlists are considered to be private, Peter must authenticate and +switch to his own user context, as in scenarios [][Switching for a transaction: access to non-driver's private data] +and [][Switching user, maintaining state – web], to locate his own playlist. + +If playlists are not considered to be private, Peter may either switch +to his own user context, or locate the playlist while remaining in +Diana's configuration as in [][Passenger acts on behalf of driver access to drivers private data] (for instance, the music +player could show an unobtrusive “Peter's playlists†folder alongside +Diana's own playlists). + +##### Peter's HMI configuration is available + +To minimize frustration, +Peter should be able to use his own configuration/â€look & feel†for the +media player to find that song, not Diana's unfamiliar configuration. + +In practice, whether Peter will actually switch users in order to do +this seems likely to depend on which he finds more irritating – using an +unfamiliar user interface, or authenticating to switch user? – and on +whether he intends to do other things “as himself†after finding the +song. Remaining in Diana's configuration is covered by [][Passenger acts on behalf of driver access to drivers private data], +so we assume here that he does switch. + +##### Active app remains active + +The media player should still be the +active app after Peter has switched to his own user context. The other +apps that were running last time Peter used the car are not started. + +##### Non-private state is transferred + +If Peter does switch to his own +user context, the state in which Diana was viewing the media library +browser (e.g. currently viewed album) is preserved. + +##### Non-private state can be transferred back to Diana + +Peter finds +the appropriate playlist, queues the song for playing and stops using +the Apertis system. If he opts to use a similar “send...†option to +return control of the Apertis system (as in scenario [][Switching user, maintaining state – web]), the state +in which Peter was viewing the media library browser is preserved, i.e. +the playlist remains displayed. If he merely switches back (“cancelling†+the transaction as in scenario [][Cancelling the transaction]), the media player returns to the +state that was saved as part of Diana's session during point **a**. + +##### Alternative model: + +The alternative model described in [][Alternative model] would +naturally satisfy points **b**, **c**, **d** and **e**, but would not +satisfy point **a** unless playlists are not considered to be private. + +#### Switching user, maintaining state – unknown app + +In a situation similar to [][Switching for a transaction: access to non-driver's private data], driver Diana starts to look +for a particular item in an arbitrary third-party app not specifically +known to the system (e.g. a restaurant guide), then asks passenger Peter +to take over. + +##### Switching user + +Suppose Peter knows that the desired restaurant +is saved in his favourites, or believes that it would be easier to find +in his user interface configuration. He should be able to authenticate +and use his own configuration to find it. + +##### Active app remains active + +The restaurant guide should still be +the active app after Peter has switched to his own user context. Like +[][Switching user, maintaining state – music], but unlike the “user switching†scenario described in +the Multiuser Design document, the other apps that were running last +time Peter used the car are *not* started. + +##### Non-private and transient state is transferred + +Suppose Diana has +got part way through finding the desired restaurant, and has narrowed +down search results to the correct city. Peter should not be required to +to repeat that process: the first thing he sees after login should be +the same search results. If the user interface is designed to set the +expectation that state will be transferred, using words such as “send†+or “shareâ€, then the amount of state that can be transferred without +violating that expectation is greater. + +##### Private state is not transferred + +Because third-party apps could +do anything, and the level of privacy of the data they deal with will +vary greatly, it should also be possible for the app developer to avoid +transferring all of its state between users. For instance, if Diana is +logged-in to the restaurant guide app so that she can submit reviews, +her login credentials must not be transferred to Peter. + +##### Explicitly returning state transfers it back + +If Peter “sends +back†the state in which he was viewing the restaurant guide, similar to +scenario [][Switching user, maintaining state – web], then that state is seen in Diana's instance of the app. + +##### Cancelling the transaction restores previous state + +If Peter +merely cancels the transaction and lets the system return to Diana, +similar to [][Cancelling the transaction], then the state in which the app was saved +before point **a** is restored. + +##### Alternative model: + +The alternative model described in [][Alternative model] would +naturally satisfy all these points except the first half of the first, +unless favourite restaurants are not considered to be private. + +#### App declines to transfer state + +Suppose a current user Alice (who could either be the driver or +passenger, there is no distinction in this use case) is using the +Apertis system under her own user context. She is using a third-party +app whose designer does not consider it to be appropriate to transfer +any state to another user under any circumstances, for example a +saved-password manager or an online banking app. + +Suppose Alice attempts to transfer state to another user Bob, as in +[][Switching for a transaction: access to non-driver's private data], +[][Switching user, maintaining state – web] etc. + +##### State transfer does not occur + +In this particular app, there is +no state that would be appropriate to transfer to Bob. The user switch +should not occur: for example, this could be implemented by displaying a +notification instead of starting the switching process, or by putting an +explanatory message where the list of possible users would normally +appear. If the UX design is such that apps normally have a “send to...†+menu option or button, it could appear disabled, or not be present at +all. + +However, if sending to another user is done via a touch gesture, there +is no direct equivalent of a disabled option. In particular, touch +gestures should always have visual feedback, whether successful or not +(similar to the way scrolling is often made to “bounce†at the end of +the scrollable range). This is so that the user can distinguish between +an unrecognized gesture, and a recognized gesture that did not result in +an action in this specific case. + +*Comment*: this does not arise when cancelling a transaction as in +[][Cancelling the transaction], because that action does not transfer state in any case. + +#### Switching user, maintaining state – missing app + +Similar to [][Switching user, maintaining state – unknown app], driver Diana starts a restaurant guide app, then asks +Peter to take over. This time, suppose that Peter has not installed the +restaurant guide, so the system will not be able to reproduce the +current state for Peter. + +##### Impossible state transfer is not offered + +Similar to the previous +scenario, the system should not offer the ability to send state to +Peter. + +One possible implementation would be to avoid displaying a “send to +user...†control in applications that are not installed for any other +users, and to avoid listing Peter in the menu of possible users if he +does not have the application. This has the disadvantage that in a +system with three or more users, it could become non-obvious why some +applications display that control and some do not, and why some users do +not always appear in the menus. + +##### Alternative design + +another design that was considered is to run +the app anyway, on the basis that it is in fact already installed on the +system. However, this undermines the abstraction that each user has +their own collection of apps. It also does not address the issue that +the app might require accepting a EULA, approving a request for special +OS permissions (access to GPS, etc.) or similar actions, which Peter has +not done. + +##### Alternative design + +a third design that was considered is to +present a choice between “just switch user to Peter†(which would +restore his last-used state) and “don't switchâ€. However, presenting the +driver with a distracting prompt/question is undesired. + +##### Alternative design + +a fourth possibility is to switch to Peter, +with the initial state in Peter's session automatically opening the app +installation procedure. If Peter chooses to install the relevant app, +the state transferred from Diana should be inserted into the “newly +installed†app. If Peter does not install the relevant app (for example +because he does not agree to an EULA or OS permissions request), the +transaction should be cancelled (as in [][Cancelling the transaction]). + +*Comments*: as with the previous scenario, this scenario cannot occur +when cancelling a transaction as in [][Cancelling the transaction], because that action +does not transfer state in any case. + +### Technical considerations + +The use case scenarios described above impact on several design +decisions which may lead to technical challenges. + +The most important of all is the implication that most of the state, +including applications, remains the same after a user switch. + +One possible approach is that the application remains running through a +user switch and simply loads the private data of the new user. + +As noted in the more general Multiuser design document, implementing +such a feature would require doing away with the separation of +privileges provided by using one UNIX uid (user account ID) and one X or +Wayland session per user, pushing all of the burden of authorization and +tracking states for each user onto the applications. The complexity for +application authors could be alleviated by providing some common high +level APIs, but even in that case it would all be new, untested code. It +would also put too much trust on the applications themselves, which +would each be treated as a security boundary in this model; it is highly +likely that some applications would mishandle user checks, allowing data +to leak from one user session to the other. + +For these reasons, Collabora does not recommend this approach; instead, +as in the more general Multiuser design document, we recommend that each +user is represented by a separate UNIX uid, with all state transfer +between users mediated by system services. + +Furthermore, we believe it is important to consider a number of +trade-offs regarding the desired functionality and the technical +viability of the solution. The recommendations below try to strike a +balance between ease of use, complexity for the application developer, +stability and security. + +## Approach + +This chapter goes over each of the requirements presenting the +trade-offs Collabora feels are necessary and proposing technical +solutions to approximate as much as possible the desired user +experience. + +### Multiple users should be able to use the system, though not concurrently + +The general approach Collabora recommends is adopting the usual approach +with one UNIX uid per user, similar to the approach used for desktop and +laptop systems. + +In most GNU/Linux distributions a user switch is performed by running a +second instance of the X server and starting a second session with the +appropriate uid, after which subsequent switching between the same users +is simply a switch between those two X servers (the so-called “fast user +switching†model, described in more detail in the Multiuser design +document). + +A similar approach could be adopted for Apertis, but it would most +certainly lead to memory pressure very quickly. For this reason +Collabora believes the best way to implement user switching is by +closing down the whole session of the current user, saving applications' +states while doing so, and only then starting the session for the other +user. This is equivalent to the procedure used in most GNU/Linux +distributions for a “log out†operation followed by a new login, but +with the addition of a “save state†step before closing each +application; it is also very similar to switching between user accounts +on Android devices. + +#### Potential for concurrent users as a future enhancement + +One option that can be considered is to provide additional hardware +resources in systems shipping for premium segment cars, such as doubling +the available RAM, for instance. This would help with memory pressure +and make the approach involving two X servers an achievable goal, with +the caveats discussed below. + +CPU usage, for instance, could become a problem and degrade the +performance experienced by the second user if the programs running on +the first user's session are kept running. One possible solution for +this is a freeze/thaw approach in which the first user's applications +remain present in memory, but their execution is paused until the first +user's session is resumed. + +If the first user's session is not frozen completely, then services +running outside the user sessions, such as the media player (see +[][Switching users should not disturb some of the core functionality such as music playing] below), would need to deal with the fact that there are now +potentially two controlling user interfaces and handle multiple +connections gracefully. + +Other system resources would also probably need to be regulated, such as +muting applications of the first user so that a game running on their +session would not interfere with a different game running on the second +user's session. Bandwidth regulation may become more complex, as well, +to ensure no application from the first user interferes with streaming +being performed inside the second user's session, and there are +implications that need to be considered if the first user's phone is +being used for Internet connection and has a metered data plan that B +will now be +using. + +### Switching users should not disturb some of the core functionality, such as music playing + +This kind of cross-session functionality points to two probable design +decisions. The first is that at least some of the data used by the +various users should be shared; for example, music should probably be +stored in a shared repository rather than in a user's private storage +area. + +The second is that the application that performs the actual playing must +persist. There are two major options here, from which a concrete +recommendation has not yet been chosen; depending on other requirements, +the various “core†components do not necessarily all need to take the +same solution. + + - It could be a *system service* (as defined by the Multiuser Design + document) rather than a regular user-level application. That means + it will be executed outside the user session, and be controlled by + the user interface via D-Bus or a similar inter-process + communication mechanism; this naturally results in its state, such + as the list of tracks currently queued, being shared between all + users. The relevant user interfaces in each user's session could all + communicate with the same system service. + + - It could be a *user service* running on behalf of the driver, which + is flagged not to be terminated during user-switching, and + communicates with the users via notifications. + +Other “core†services that need to span across multiple user sessions, +such as navigation (if present in the Apertis domain), could follow a +similar design. For example, the oFono service used for telephony is +already a system service, so taking the “system service†option for that +is a natural approach. + +If the system service needs to distinguish between users and act on +behalf of a specific user in response to their requests, it is important +to note that this results in it being part of the TCB (trusted computing +base) responsible for enforcing separation between users. Such services +should be checked to ensure that they do not violate the system's +intended security +model. + +### When the user starts the system they should find the same applications they had left open at shutdown, and in the same state + +This topic is discussed in the Multiuser and Applications design +documents. The only aspect directly relevant to this particular document +is that the same “save state†step that would be done during shutdown +should be performed when switching away from a user, so that the saved +state can be reloaded for use cases such as scenario [][Cancelling the transaction]. + +### When switching users, open applications must remain open + +This requirement exists to enable use cases in which the driver asks a +passenger who is also a user of the system to perform some task ( +[][Switching user, maintaining state – web], [][Switching user, maintaining state – music], +[][Switching user, maintaining state – unknown app] and [][Switching user, maintaining state – missing app] +are examples of this category). This passenger would log in, but at least part of the +state of the driver's session would remain. + +We see three possible ways to satisfy these use cases: + + - Transfer the state of all apps from the driver's session to the new + user's session + + - Transfer the state of the single foreground app from the driver to + the new user + + - Do not transfer any state + +In some use cases such as [][Cancelling the transaction] and [][App declines to transfer state], +having the same applications +open after a switch is not a desirable user experience. It is not +necessarily true that the user would like to use, for instance, the +browser if the previous user had it open. It's also not clear that the +currently open browser tabs are necessarily interesting to the new user, +particularly if they will “overwrite†the new user's saved browser tabs +from their last session. + +Finally, the privacy implications of implicitly transferring state are +considerable, with significant potential for “over-sharingâ€; this could +cause users to lose confidence in the system and avoid using it for +personal data, reducing its usefulness. + +Our recommendation is that each application should have a way to +indicate to the operating system whether it is able and willing to send +(partial or full) state to another user's instance of the same +application. If it is, the HMI can display a “send to other user†option +for that application; if the application is such that state transfer is +unsafe or never useful, or if it simply does not support state transfer, +then that option would appear disabled (greyed-out) or not appear at +all. + +The actual state transfer would be similar to the state saving mechanism +that is already needed for save/restore functionality (as discussed in +the Preferences and [Persistence design document](preferences-and-persistence.md), and more briefly +in the [Multiuser](multiuser.md) and [Applications](applications.md) design documents), but placing +state in memory or in an OS-supplied temporary directory instead of in +the per-(user, app) data directory. We recommend that similar data +formats and API conventions should be used, so that in trivial cases +where there is no private state, the application's implementations of +“save state†and “send state†can call into the same common code. +However, it should be presented as a separate, parallel API call, to +encourage application authors to think about the amount of state +transfer between users that is desired. Similarly, the “restore state†+and “receive state†operations should be distinct, but follow similar +enough conventions that they can share an implementation if that is what +the application author wants. + +Optionally transferring the state of a single foreground app, with +vendors encouraged to design their HMIs to set appropriate privacy +expectations for this action, seems a reasonable compromise between the +convenience of transferring state when it is desired, and the disruption +and privacy concerns of transferring state when it is not desired. + +We do not recommend the alternative model in which the superficial +appearance of the passenger's preferences is applied to processes that +continue to run with access to the driver's personal data (as outlined +in [][Alternative model]), since that approach seems likely to lead to users' privacy +expectations not matching the reality. This applies to both the driver's +privacy (it is not entirely obvious that the passenger can still access +the driver's private data) and the passenger's privacy (for instance, it +is not at all obvious that the passenger should not enter passwords into +what appears to be “their†+browser). + +### Switching users shall be performed with a smooth transition, with no visual flickering + +This topic is discussed in the Multiuser Design document. We recommend +the approach involving the first user's session handing off to a +separate system-level compositor, which in turn hands off to the second +user's session. + +### User switching should not take more than 5 seconds + +This requirement puts pressure into how long the user session may take +for closing down. An application that spends a lot of time writing state +or doing some other processing, like an email client synchronizing its +state with a slow IMAP server, may increase the amount of time required +for completing the switch significantly. This means care must be taken +in application development to not allow this. + +Other than that, Collabora believes the system components for user +switching should be pretty fast and that the 5 seconds goal is +achievable. + +Note that in a premium car system, depending on the additional amount of +memory available, the applications would not necessarily really be +closed down, so this requirement could more easily be achieved by simply +freezing the existing session or not touching it at all. + +### User data is private to each user + +By using the traditional “one UNIX uid per user†approach, each user +will have its own home directory protected by the usual mechanisms, such +as file ownership, user and group permissions, in addition to the +AppArmor restrictions described in the [Security Design document](security.md). +Note that usage of the UNIX home directory concept, in which a single +directory has all of a given user's files, is not in the plans for +Apertis. Instead, each application will store its data in a directory +named after the UNIX user account, and owned by the appropriate uid, but +inside the application directory. + +More information about this can be found in the Applications design. + +#### However, some data will be shared + +The requirements state that optionally some data can be shared if it +makes the problem more tractable. Collabora believe it's a good idea to +make installed applications and data such as the music library be +shared. Making installed applications per-user makes application +management much more complex, including possibly having to waste space +by having two separate versions of the same application available. + +A custom view can still be provided for each user. The icons for +applications may appear only if the user explicitly installs the +application, which in this case would not cause a new download, just the +addition of the icon to the user's launcher. The same can go for other +kinds of user interface aids such as playlists, providing the user with +a way of picking the songs or videos they are interested in from the +shared library. + +More information about this can be found in the Applications +design. + +### Removable devices are accessible to all users and all users can unmount/eject them + +This requirement can be fully satisfied by the proposed approach. The +mounting and unmounting of devices is a privileged operation that is +mediated by system services already, so making it so that any user can +mount and unmount devices no matter who mounted them in the first place +is simply a matter of setting that up as the policy. + +## Limiting customizability as a trade-off + +If the Apertis ends up being designed with no user switching or even no +multi-user capabilities, then it might be desirable to consider limiting +the customizability of the system, so as to not burden drivers who +seldom use the system that is customized by the main driver. + +As a general principle, the easier it is made to switch between users, +the more customizability can be offered without it becoming a problem. +One special case is that the mechanism to switch users should remain +obvious and in a consistent location in all configurations and themes. +Similarly, the user interface for driver-focused tasks, such as the icon +to open satnav functionality, should remain consistent between +configurations. + +If user-switching is absent or limited, Collabora believes that any +customization that allows relocation of items and interface controls +should be avoided. That means any configuration for the positions or +visibility of menu items, application launchers, core user interface +elements such as the status bar, the back button, and so on should not +be allowed. + +Appearance customization, such as colour scheme, should not cause +trouble for a casual user of the system trying to find their way. The +same goes for features that allow organization of user data such as the +creation of custom playlists or photo albums. However, configuration of +fonts and font sizes can cause the core UI elements to change layout in +ways that might be confusing, so allowing configuration for those needs +to be considered carefully. + +## Recommendations summary + +As discussed in [][Multiple users should be able to use the system though not concurrently], +Collabora recommends having one UNIX user +account ID (uid) per user. The first user to be registered in a new +system must be able to perform administration tasks such as system +updates, application installation, creation of new users and setting up +permissions, as discussed in the main Multiuser Design document. + +At a conceptual level, user switching should be done by closing down the +user session and starting the new user session, to avoid memory +pressure. However, implementors should consider allowing the old session +to run in parallel for a short time while applications are given a +chance to save and exit. Running two user sessions in parallel for an +extended period of time, to enable “fast user switchingâ€, can be +considered for premium cars with greater computing resources available. + +Services that need to stay running after a user switch should have their +background functionality split from their UIs, as discussed in section +[][Switching users should not disturb some of the core functionality such as music playing]; +they can either run as a different UNIX user account ID – a “system +service†– or be a specially flagged “user service†that is not +terminated with the rest of the session. + +Collabora recommends against trying to have a login mode that moves the +entire session state from the current user to the user that is logging +in, as described in [][When switching users open applications must remain open]. +To satisfy use cases in which the +current state of one user's application is sent to another user's +instance of the same application, it would be sufficient to have that +single application save and restore state, using a mode which omits +private data from the state where necessary. It is not necessarily +possible or desirable to implement this for every application, and care +must be taken to set appropriate privacy expectations. + +Ways of having a smooth visual transition when switching users are +discussed in the main Multiuser Design document. Collabora recommends +the use of multiple Wayland compositors, with the first user's *session +compositor* handing over control of the graphics device to a *system +compositor* to perform the switch, which in turn hands over the graphics +device to the second user's session compositor. + +Collabora recommends in [this section][However, some data will be shared] that data for applications and +media files be shared among users to avoid duplication, with custom +views allowing per-user customization. diff --git a/content/designs/multiuser.md b/content/designs/multiuser.md new file mode 100644 index 0000000000000000000000000000000000000000..75ec784327a9daa6926da544098da26348dec2ee --- /dev/null +++ b/content/designs/multiuser.md @@ -0,0 +1,1422 @@ +--- +title: Multiuser +short-description: Guide and recommendations to help designing the multiuser system + (general-design) +authors: + - name: Simon McVittie +--- + +# Multiuser + +## Introduction + +This document describes how multiple users are expected to use the +Apertis system, and works mostly as a guide and recommendations to help +designing the system. It is intended to act as an “umbrella†document +covering the multi-user topic in general, and will be supplemented by +more concrete documents describing particular use-cases and +recommendations for how those use-cases can be addressed. + +At the time of writing, there is one such document, “Multiuser Design: +Transactional Switchingâ€. Please see +[*https://wiki.apertis.org/ConceptDesigns*](https://wiki.apertis.org/ConceptDesigns) +for current design documents. + +The driving force behind having a multi-user system is to allow +customization of the system. A car may have multiple drivers and +passengers who would be frustrated by customizations done by each other +to the system's look and feel and even to data such as playlists. Having +multiple users allows each to customize their own interface. + +Depending on OEM and consumer requirements, multi-user systems can +potentially also provide personal files and online accounts for each +user. + +## Terminology and concepts + +### “user†vs. “uid†+ +In a Unix system, users are typically identified by a numeric user ID, +often abbreviated “uidâ€. A uid can represent a person, a system +facility, multiple people, or even an application (as in Android). + +Because these do not correspond 1:1 in some designs, it is important to +be clear which one is under discussion. In this document, the jargon +term *uid* or *user ID* is used to refer to a Unix user identifier, +while *user* or *person* is used to refer to a human using the system. + +*User account* refers to any abstract representation of the user within +the system. This is most commonly a uid, matching the original Unix +design. However, systems can exist with multiple uids per user account, +such as Android, in which each (user account, app) pair has a uid. +Conversely, systems can exist with multiple user accounts sharing a uid, +such as SteamOS (in which one uid runs the Steam Big Picture UI, and +users log in to it with separate Steam accounts). + +The canonical form of a Unix uid is numeric, but for ease of reference, +a short lower-case textual *username* may be used to refer to a uid. For +example, it is common to talk about system users named “root†and +“backupâ€, but the real identities of these users within the system are +the corresponding numeric uids 0 and 34; the usernames are merely for +convenience and mnemonic value. + +### Trusted components + +A *trusted* component is a component that is technically able to violate +the security model (i.e. it is relied on to enforce a privilege +boundary), such that errors or malicious actions in that component could +undermine the security model. The *trusted computing base* is the set of +trusted components. This is independent of its quality of implementation +– it is a property of whether the component is relied on in practice, +and not a property of whether the component is *trustworthy*, i.e. safe +to rely on. For a system to be secure, it is necessary that all of its +trusted components must be trustworthy. + +One subtlety of Apertis' app-centric design is that there is a trust +boundary between applications even within the context of one user. As a +result, a multi-user design has two main layers in its security model: +system-level security that protects users from each other, and +user-level security that protects a user's apps from each other. Where +we need to distinguish between those layers, we will refer to the *TCB +for security between users* or the *TCB for security between apps* +respectively. + +### System services + +A *system service* is a service that, conceptually, runs on behalf of +the whole computer or car, without a division between users. In designs +where each user has a distinct uid, system services run under a system +uid, either root (the most privileged uid) or a special unprivileged uid +per service or group of services; they do not run with the uid of any +particular user. + +This term does not necessarily imply anything about whether the service +is considered to be “part of the operating systemâ€, or whether it is +part of a preinstalled or user-installable application bundle as +discussed in the Applications Design document. However, because system +services can accept requests from multiple users, any system service +that will handle users' private data must be trusted to impose a +privilege boundary. + +Examples of system services commonly present in Linux systems include +ConnMan, NetworkManager, BlueZ, udisks and the D-Bus system bus. + +### User services + +A *user service* is a service that runs on behalf of a particular user. +In designs where each user has a distinct uid, each user's user services +typically run under that same uid; in designs like SteamOS where all +users share a single generic uid representing “all usersâ€, user services +would typically share that same uid. + +Examples of user services commonly present in Linux systems include +dconf, gvfs, Tracker, Tumbler and the D-Bus session bus. + +Identifying a process as a user service is independent of whether it is +treated as part of the Apertis platform and independent of any +particular application (such as the user services mentioned above), or +treated as part of an application bundle (“agents†associated with +apps). + +### Multi-seat (logind seats) + +In the context of a multi-user system, a *seat* is a collection of +display and input devices, optionally linked to other devices such as a +USB socket or optical drive, intended to be used by one user at a time. +Typical PCs only offer one seat, but a second graphics adapter, often +connected via USB, can be used to add additional seats (a *multi-seat* +system). + +This jargon term is commonly used in Linux system services such as +systemd-logind, the older ConsoleKit, and GDM. In the context of a car, +it should be noted that it does not necessarily correspond precisely to +the car's seats: for instance, in the common layout that places a single +“head unit†touchscreen between the driver and front passenger, that +touchscreen and any USB sockets adjacent to it would be treated as a +single seat. If, for example, additional touchscreens were added behind +the front seats for use by rear passengers, that would be a multi-seat +system with 3 seats (front, rear left, rear right). + +Apertis uses systemd-logind as a core system service, so where +disambiguation is needed, we will refer to this as a *logind seat*. + +### Fast user switching + +Many operating systems have the concept of *fast user switching*, which +is described in [][“Fast user switchingâ€: switching user without logging out]. +Following common usage, this document reserves the term “fast user switching†to refer to that +particular multi-user model, even if some other model might be equally +fast or faster in practice. + +## Requirements + +Apertis is currently designed as a single-user system. There is one GUI +session with full access to all preferences, apps and data, and a set of +apps and user services with varying levels of sandboxing and privilege +separation from each other within that session, running on top of system +services whose privileges vary. The high-level requirement for this +document is that this should be expanded to support multiple GUI users, +each with their own private data and user services, running on top of +similar system services. + +This section contains a list of general requirements applicable to many +multi-user systems. + + - Multiple users should be able to use the system. Depending on the + specific set of requirements, this could involve concurrent use, or + one user at a time. + + - When the user logs in to a newly started system, they should find + the same applications they had left open last time they shut down + the system, and in the same state. See [][Returning to previous state] + for discussion of this topic. + + - Some data is private to each user. Depending on the specific set of + requirements, this could include: + + - Settings + + - Address book + + - Browser history + + - Application icons + + - Arrangement of icons in the app launcher + + - Account data for web services + + - Playlists + + - Some data is shared between users. Depending on the specific set of + requirements, this could include: + + - Applications (from the store) + + - Media library (music, videos) + + - Depending on the specific set of requirements, switching users at + runtime could be supported. Where it exists, this shall be performed + with a smooth transition, with no visual flickering. User switching + should not take more than 5 seconds. See [][Switching between users] + for discussion of this topic. + + - A subset of features are considered to be core functionality, and + must not be disturbed by switching between users: they must remain + available before, during and after any transition between users. The + set of core functionality could vary by device; in this document we + mainly use music playing and navigation as examples of this + category. See [][Preserving “core†functionality across user-switching] + for further discussion of this topic. + + - The subset of features that are not disturbed while switching + between users must not be limited to functionality that is + considered to be “part of the operating systemâ€. For example, it + should be possible to place a user-installable player for a + third-party music streaming service such as Spotify or last.fm in + this category. Again, see [][Preserving “core†functionality across user-switching]. + + - Depending on the specific set of requirements, peripheral hardware + devices such as USB storage devices and paired Bluetooth devices + could either be shared across the entire system, or specific to a + user. If they are shared, then they must be accessible to all users, + with all users able to unmount/eject them. + + - The authentication and user-switching user interface should not + distract the driver more than is necessary; for instance, they + should not ask security or authentication questions unless a + decision is strictly required. + + - The user privileges of the system should be visually obvious: if + users have selected different personalizations such as colour + schemes or themes, then the display should use a particular user's + theme whenever it is acting on behalf of that user, and at no other + time. This limits the risk that users will encounter undesired + privacy consequences resulting from misunderstanding the system's + privacy model. + +### Distinguishing between privacy levels in user-specific data + +There are several possible categories of user-specific data. + +Some user-specific data is private. For instance this might include +email, browsing history, social media feeds. (Alice should not be able +to read Bob's email, history, social media feeds and so on unless Bob +has allowed it.) Meanwhile, some user-specific data is sensitive because +it allows acting on someone else's behalf. (If Alice is logged-in to +Amazon, Bob should not be able to buy things using her account.) Private +and sensitive data are interchangeable from a systems perspective: they +must be accessible by that user, and only by that user. + +However, some data is only user-specific for convenience or +organization; it isn't important whether other users are able to read +it, as long as it doesn't make their own actions less convenient. + +For instance, the set of apps that are visible in menus might be one +example of user-specific data that does not necessarily need to be +treated as private. If Alice has installed apps for social media +networks that Bob doesn't use, they shouldn't appear in Bob's menus — +but if Bob specifically looks for them, perhaps in an +Android-Settings-like “storage usage†view, it might be considered +acceptable that he can see what Alice installed. + +Another possibility for sharing data is that playlists within a shared +media library could appear as an unobtrusive “Bob's playlists†folder in +other users' menus, if desired. + +As discussed in [][Levels of protection between users], the level of privacy and integrity +protection between users can vary according to OEM and consumer +requirements; this could influence how user-specific data is +categorized. + +### Authentication + +We assume that the HMI provides a way for users to identify and +authenticate themselves to a trusted HMI component, for instance by: + + - presence of a unique physical key + + - presence of a personal item such as a phone with Near-Field + Communication support + + - a password or lock-screen gesture + + - face or fingerprint recognition + + - simply selecting a user from a menu (choice of user, but no + meaningful authentication, similar to one of the cases described in + [][Switchable profiles without privacy]) + +The exact authentication mechanism depends on manufacturer and user +requirements, and is outside the scope of this document: this document +only assumes that an identification/authentication mechanism exists as +part of the operating system, and does not rely on specific properties +of that mechanism. + +### General use-cases + +While this document does not go into the specifics of more elaborate +use-cases, there are a few simpler use-cases which should be considered +by any concrete multi-user design within the framework established by +this document. In some cases these use-cases could be considered and +rejected, if a particular design's requirements put them out of scope. + +#### First use + +Alice uses the car for the first time. The system recognises that she +has not used it previously and so there is no saved state. + +**a. First use**: The system starts in some default state, for instance +at a main menu or with a default application such as a media player +running. + +#### Individual use: preferences and state restored + +Alice and Bob share a car, and have separate keys. Alice has configured +the display for a red UI theme; she uses the car on Monday, listens to a +podcast while she drives, and has the email app open in the background. +Bob has configured the UI for a blue theme. He uses the car on Tuesday, +and reads the BBC News website in the browser app while stopped at +motorway services. + +**a. Last-used mode**: The next time Alice starts the car and +authenticates as herself (see [][Authentication]), the podcast and email apps +should resume in the same state they were in when she shut the system +down on Monday, and the HMI configuration should reflect her preferences +(the red theme should be used, etc.). Similarly, the next time Bob +authenticates as himself, the BBC News website should be displayed in +the browser app as it was when he shut the system down on Tuesday, and +the blue theme should be used. + +**b. Privacy between non-concurrent users**: If the system is configured +to provide protection between users, then Alice's private data should +not be available to Bob and vice versa. For instance, Bob's web browsing +history and social media accounts should not be available when Alice +starts the web browser, even if Alice deliberately looks for them. + +#### User switching + +**a. User switching**: Bob is currently using the HMI to read Twitter, +and Alice wants to check her email. Neither is currently driving. Alice +should be able to authenticate in some way (see [][Authentication]), switching +the HMI to have Alice as its current user. When she has finished, Bob +should be able to switch the HMI back so he is the current user again, +and continue to read Twitter. + +**b. Privacy during user switching**: after switching from Bob's user +account to Alice's, Bob should be able to go away, knowing that Alice +cannot access his Twitter feed. When Alice has finished and hands back +control to Bob, she should be able to know that Bob cannot access her +email. + +> In existing multi-user systems like those described in section 4, +> this is typically implemented by leaving Bob's user account in a +> “locked†state after he transfers control to Alice, and vice +> versa, requiring re-authentication before resuming +> use. + +#### Guest mode + +Greg, a guest, is in Diana's car. + +**a. Unauthenticated guest session**: If Diana has enabled it (or if it +is enabled by default and Diana has not disabled it), Greg should be +able to start a guest session that can access public information and the +Web, play music from the car's music library, etc. without +authentication. + +**b. Owner's privacy**: Greg should not be able to access Diana's +private data (or the private data of any other user of the system). + +**c. Guest's privacy**: Greg's browser history, Facebook authentication +token, etc. should not be available to subsequent guests. For instance, +the system could temporarily allocate space for Greg's user-specific +data, then discard it and terminate all guest processes as soon as Greg +logs out, returning to default settings for the next guest. + +**d. Guest is restricted**: Greg should not be able to add or delete +music, install or remove apps, or similar actions. + +#### Borrowing the car + +Diana lends her car to David, giving him her key. + +If the system is configured to consider a key as sufficient +authentication for a user, then it cannot be expected to protect Diana +from malicious action by David. However, if the system is configured to +require secondary authentication such as a password, PIN or lock-screen +swipe pattern, then David will not be able to use Diana's account. + +**a. Can create a new account**: Even though David and Diana are using +the same key, David should be able to create a new account that saves +his preferences, and switch to it. + +## Existing multi-user models + +This chapter describes the conceptual model, user experience and design +elements used in various non-Apertis operating systems' support for +multiple users, because it might be useful input for decision-making. +Where available, it also provides some details of the implementations of +features that seem particularly interesting or relevant. + +### Switchable profiles without privacy + +The simplest multi-user model can be found in platforms such as Windows +95 and the Sony PlayStation 3. In these systems, certain settings and +other pieces of application data (such as documents and saved games) are +stored separately for each user, but there is no privacy or protection +between users: each user can easily access other users' accounts. + +One variant of this is where no authentication is required to access a +different account, as on the PlayStation 3: a user selects their name +from a list, and there is nothing preventing them from selecting a +different user's name instead. Similarly, an unauthorized user can +identify themselves as any authorized user and gain access. + +Another variant of this is where there is meaningful authentication +(e.g. a login step with a password), but authenticating as *any* user is +sufficient to access *all* users' private files. For instance, Windows +95 offered login authentication, but did not support filesystems with +user-level permissions. As a result, unauthorized users were prevented +in principle (in practice, the login step was easily circumvented), but +each authorized user had the technical capability to read and write any +other user's files by navigating to the appropriate directory. + +Both variants of this model are simple to implement, and provides +straightforward semantics. Their disadvantage is that they do not meet +typical privacy expectations for a modern operating system: users can +impersonate one another, read each other's private files, and even alter +each other's private files. As such, it is only suitable for an +environment in which every user of the system fully trusts every other +user of the system (and, for the first variant, everyone with physical +access to the system). + +We anticipate that these simple use-cases will be appropriate for some, +but not all, Apertis systems: for example, they might be appropriate for +a family car where the installed apps do not handle particularly +sensitive information. In other Apertis systems, stronger +privacy/protection between users is likely to be required. + +### Typical desktop multi-user + +Many modern desktop/laptop operating systems (such as the Windows NT +series, Mac OS X, and various open source desktop environments on Linux +and BSD platforms) have a similar model for how multiple users are +handled. Apertis shares many software components with the GNOME 3 +desktop environment (as used in, for instance, Debian GNU/Linux and +Fedora Linux), so we will use GNOME on Linux as our primary example of +this type of environment. + +On Unix-derived systems such as Linux and Mac OS X, each user account is +typically represented by one Unix uid, corresponding to their intended +use in all Unix systems. + +#### Basic multi-user: log out, log in as another user + +The most basic form of multi-user support is considerably older than +graphical user interfaces, and is implemented in most current +desktop/laptop operating systems. The system boots to a login prompt at +which the user can choose their user account (for instance by choosing +from a list or by typing its name), and authenticate in some way +(typically with a password, but many authentication mechanisms are +possible). + +Each user has their own set of data files and configuration. To provide +privacy between user accounts, the system tracks the ownership of user +files, and either denies access to other users' files by default, or can +be configured to do so. + +To switch between users, the first user must log out, ending their +session; this typically also terminates most or all of their user +services. Ending their session presents another login prompt, at which +the second user can log in. + +In a typical implementation on Linux systems with the X11 windowing +system, a system service (a “display managerâ€, such as GNOME's GDM) +starts an X display and uses it to show the graphical login prompt. When +the first user logs in, their uid is granted access to the X display, +which is taken over by their session. At the end of their session, the +display manager terminates the X server, and starts a new X server for +the next login prompt. + +Systems which offer this model can easily support the simpler models +from [][Switchable profiles without privacy] as trivial cases of this model: they can implement the +PlayStation 3-like model by omitting the authentication step after +choosing a user, or the Windows 95-like model by giving each authorized +user access permissions for other users' files. + +#### “Fast user switchingâ€: switching user without logging out + +A refinement of the above model for systems with enough memory is to +offer more than one parallel login session, with one active login +session and any number of inactive sessions. This is commonly referred +to as *fast user switching*. + +Again, most current desktop/laptop operating systems offer this in some +form. The first user chooses a “Switch User...†option from a menu; this +optionally locks the first user's session (for instance by locking their +screensaver), and switches to a login prompt at which the second user +can log in. To switch back, the second user uses “Switch User...†to +access another login prompt, at which a third user can log in, and so +on. Several users can share the system, with up to one active session +and any number of inactive sessions (limited by system RAM, and +optionally an arbitrary limit on the number of users). + +If the user logging in at the login prompt already has a login session, +then the system detects that, and instead of starting a new session, it +switches back to the existing session, automatically unlocking the +screensaver if required. When a user logs out, their session is replaced +by a login prompt at which any user can log in. + +Designers typically treat this model as a superset of the simpler model +in [][Basic multi-user: log out, log in as another user]: +in practice, implementations of “fast user switching†+also offer the non-concurrent log-out/log-in arrangement as a trivial +case. Similarly, as in [][Basic multi-user: log out, log in as another user], +implementations of this model can +easily support the models from [][Switchable profiles without privacy] as trivial cases. + +In GNOME's GDM display manager, the first session takes over the X +server originally used for the login prompt, the same as in +[][Basic multi-user: log out, log in as another user]:; +this runs on a Linux virtual console, traditionally tty7. The +“Switch User...†option causes the display manager to run a new X +server on a different virtual console, typically tty8, and switch to it; +the second user's session takes over that X server, and so on, +allocating a new virtual console and running a new X server each time. +If a user logs out, the display manager remains on the same virtual +console, but runs a new X server for the login prompt. If the user +logging in at the login prompt already has a login session, instead of +taking over that X server for a new session, the display manager +switches to the appropriate virtual console for the existing session. +The X server with the login prompt remains in the background, and is +re-used the next time a login prompt is required, instead of starting a +new X server: for example, a system where three users Alice, Bob and +Chris repeatedly switch between their accounts would reach a “steady +state†with four X servers on four virtual consoles (corresponding to +Alice, Bob, Chris, and the login prompt). + +Once two or more users have logged in, this model provides very rapid +switching between them: none of their applications or user services need +to be terminated or restarted. It also eliminates any loss of transient +“context†such as notifications or window positions, without needing +to implement state-saving. However, it uses a significant amount of +memory: because inactive users' applications are not terminated, two +alternating users could need up to twice as much memory as a single +user. Similarly, because the inactive users' applications are not +terminated or paused, merely disconnected from input and display +devices, they can continue to consume other resources, such as CPU time +and network bandwidth: a misbehaving application in Alice's session can +cause Bob's session to appear slow. + +#### Multi-user desktops with multi-seat support + +Some systems, in particular the systemd-logind component used in +Apertis, can be used to extend the model in [][Basic multi-user: log out, log in as another user] by offering +several so-called “seats†as defined in [][Multi-seat logind seats]. A logind seat is a +collection of display and input devices intended to be used by a single +user, offering the equivalent of section [][Basic multi-user: log out, log in as another user] independently on each +logind seat. Similarly, a system can offer “fast user switching†+([][“Fast user switchingâ€: switching user without logging out]) on some or all of the available logind seats. + +GNOME's GDM display manager switches between virtual consoles on the +first logind seat, in exactly the same way as section [][“Fast user switchingâ€: switching user without logging out]. On the +second and subsequent logind seats, it behaves as described in [][Basic multi-user: log out, log in as another user], +with this logind seat's X server remaining visible regardless of the +current virtual console, and does not offer “fast user switchingâ€. + +### Android 4.2+ + +Recent versions of Android have gained multi-user support, initially for +tablets only, then extended to phones in Android 5. + +When first started, [Android 4.2] shows a prompt for setting up the +first user account. The first user account is special in that it is +considered the administrator for the device, and can thus create, remove +and assign permissions to other users. + +Android uses separate Unix user account IDs (uids) for separating +applications from each other, so any communication or sharing between +applications was already mediated by the Linux kernel and other trusted +parts of the Android system software. The multi-user design simply +allocates a block of uids to each user, one uid per (user, application) +pair: for example, the first user (user number 0) might receive uids +u0a123 and u0a45 for two of their apps, and user number 1 might receive +uids that include u1a67. + +Because applications are already isolated from one another by their +differing uids, all interaction between apps is mediated by trusted +processes, so those trusted processes were adapted to take the user into +account when deciding permissions. Similarly, because apps +conventionally use Android-specific APIs to access user data, adapting +those Android-specific APIs to take the user into account is +straightforward: an application making an API call that previously +listed *all* online service accounts will now only be told about the +appropriate user's online service accounts. + +Authentication is through the usual means used by Android: each user +gets their custom lock screen and, depending on that user's settings, +types in a PIN, a password or a pattern connecting dots in a grid for +logging in. Icons representing all users are shown in the current user's +lock screen, so user switching is a matter of locking the screen (which +can be done through the 'quick settings' menu, available in the status +bar) and tapping the desired user. + +From a user interface perspective, this resembles [][“Fast user switchingâ€: switching user without logging out] +on typical desktop operating systems. +However, as an implementation detail, each user's apps are terminated +when user switching occurs, so the actual implementation is closer to +the “log out / log back in†model (section [][Basic multi-user: log out, log in as another user]). + +Some settings are global to the device, including Wi-Fi networks. All +users can change these settings, apparently, and those changes will +affect every other user. User settings and data are kept separate from +each other's. The list of applications in the user's launcher is +separate for each user, but application files are only downloaded the +first time a user asks that application to be installed, to save space. + +Because Android provides custom API for everything the application does, +the storage and reading of data and settings for each user is done +automatically by their APIs. That means applications did not have to be +modified for supporting multi-user: the fact that they already use +Android APIs to obtain directory paths and save files ensures that they +are saved to the proper place. + +### Multi-user support in the Tizen 3 automotive platform + +The multi-user architecture designed for Tizen 3 in an automotive +environment was [presented][fosdem-conf] at FOSDEM 2015. + +At a conceptual level, Tizen applications can either be installed +system-wide or for a particular user. Guest users can only use +system-wide applications; it was not clear from the presentation whether +only preinstalled applications can be system-wide, or whether separate +installable applications can also be installed system-wide. If installed +for a particular user, the application's files are copied into that +user's home directory, contrasting with the centralized app storage used +“behind the scenes†in this design document and in Android. + +The Tizen model is designed for a “multi-seat†environment as described +in [][Multi-user desktops with multi-seat support], +where several sets of grouped devices (a display, its attached +touchscreen input device, and perhaps USB sockets and/or a headphone +jack located near that display) are all attached to the same computer as +peripherals; this is an attractive model if the system is powerful +enough to provide acceptable performance on all seats, but comes with +higher performance requirements than some of the potential classes of +requirements addressed by this document. In particular, there is a focus +on the ability to move concurrent applications seamlessly from one +screen to another, following a user who moves from one seat to another. + +In the Tizen model, all users share a single compositor, which manages +all seats' displays and input devices, resulting in the compositor being +required to act as part of the TCB for security between users (see +[][Trusted components]). As discussed further in [][Graphical user interface and input], +we do not recommend this approach while using X11 for GUI services. + +There is a single privileged user in the Tizen system, and only that +user can configure certain shared resources such as wireless networking +and Bluetooth. This seems an unnecessarily limiting model for a car that +might be shared between two or more primary drivers, for example in a +family. It is intended that this user will eventually be able to launch +applications on seats that are currently in use by other users. + +The API model in Tizen appears to involve system services such as the +media server and thumbnail generation service not only acting on behalf +of users to fulfill requests, but running as 'root' so that the same +application can write directly into multiple users' home directories. We +recommend avoiding this practice: it puts all of those services into the +TCB for each layer of the security model (security between users, +security between apps and security between system services), greatly +increasing the amount of security-sensitive code in the system and the +potential impact of a bug or security flaw. + +The presentation mentioned adding the user ID as an explicit parameter +in IPC (inter-process communication) calls from applications to system +services so that the system service will act on behalf of the +appropriate user. This could be made to work securely by verifying that +the actual user ID matches the one in the IPC call, but is a potentially +dangerous approach: if a naive implementation trusts the given parameter +and does not verify it, a malicious application could easily subvert +that implementation. We recommend avoiding “user ID†parameters in APIs: +if the service can determine the user ID in a secure way, then the +parameter is unnecessary, and if it cannot, this approach brings the +calling application into the TCB for security between users (with the +practical result that all or nearly all applications would end up in the +TCB, greatly increasing the system's attack surface). + +## Approach + +Because this document does not define precise requirements or use-cases +for the system, this section outlines multiple possible approaches to +several design questions. The choice between these approaches must be +made based on concrete requirements. + +### The principle of least-astonishment + +One valuable general design principle is that, when a user carries out +an action, it should be easy to predict the outcome. In the context of a +multi-user system, this implies various more concrete principles, such +--- + + - sharing should not occur when a user would not expect it to; this + “over-sharing†is likely to lead to users distrusting the system + and being unwilling to store private data in it, even if that would + be advantageous + + - sharing should occur when a user would expect it to; if it does not, + users will be inconvenienced by having to copy data manually between + different contexts + + - performing a similar action in different contexts should have a + similar result + +### Levels of protection between users + +There is a spectrum of possible sets of requirements for privacy and +integrity protection between users: a strongly protected model similar +to the one detailed in section [][Typical desktop multi-user], a model with no protection at all as +described in [][Switchable profiles without privacy], or anything in between (e.g. with protection +between users in general, but certain categories of data explicitly +shared). + +The desired level of protection depends on the user, but we could also +decide that Apertis will only support a subset of the possible range, +and an OEM could decide that they will only support a subset of the +range allowed by Apertis. + +In use-cases that involve differently-privileged users, the desired +level of protection might vary between users within a system: for +instance, the main users of a car might opt for a setup in which +switching from one main user to another does not require authentication, +but switching from a “guest†user to a main user does. + +For each set of requirements, we aim to minimize the “friction†in +switching between users, subject to whatever minimum is imposed by the +requirements – stronger privacy and integrity protection comes with a +higher minimum “frictionâ€. For example, if users are to be protected +from each other, then switching between users must include an +authentication step, whereas if there is no effective protection +(privilege boundary) between users, switching between users merely +requires choosing the desired user account. + +As a general design principle, design documents for concrete use cases +should address the “strongest†supported protection between users, +because that imposes the most difficult privacy/integrity requirements. +Secondarily, they should consider the “weakest†supported protection +between users, because that imposes the most general sharing +requirements: ideally, this is just a trivial case of the high-privacy +version, with some of the “pain points†omitted, but it does introduce +new requirements for the ability to pass data between users. All other +levels of privacy/integrity protection can be represented as somewhere +between those extremes. + +As a compromise plan if we find situations that cannot be solved in a +higher-privacy model, it is possible to relax our requirements to +declare the highest-privacy use cases to be out of scope. + +### User accounts: representing users within the system + +There are three possible approaches to representing users in a Linux +system. + +#### Sharing one uid between all users + +In this approach, all user applications and user services run under the +same uid. The system defines its own proprietary “user account†concept, +and all components that access user-specific data must ensure that they +access the correct user's data, disallowing access to other users' data +if appropriate. + +This has the potential to make transitions between users very easy: the +“current user†is simply a variable within each application or +service. However, it places a great deal of trust on each of these +components, including every third-party (user-installable) application +that accesses user-specific data. If the system's security model is that +users can be protected from each other, then in effect, all of these +components are included in the trusted computing base; if the +requirements do not include protection between users, then +distinguishing between users is not required for security, but is still +required for correctness. In practice, we anticipate that not every +component would discriminate between users correctly. + +This approach also has practical problems for the re-use of existing +open source components, which assume the traditional use of one uid per +user. Having to modify all of these components, with a complex change +that is unlikely to be accepted by their upstream developers, would +significantly reduce the competitive advantage derived from their use. + +As a result of these disadvantages, we do not recommend this approach +for Apertis. It would only be viable if all of the following are true: + + - users are not protected from each other, and this will not change in + future development + + - user-specific data is minimal, only needs to be accessed via + Apertis-specific APIs, and this will not change in future + development + + - it is not considered to be a significant problem if third-party + applications and services do not consistently distinguish between + users, and this will not change in future development + +An additional consideration for this approach is that it potentially +alters a large number of interfaces (such as D-Bus method calls) to have +a parameter for the user account to be affected. If changing +requirements result in switching to the “one uid per user†or “many uids +per user†models in future, such that the correct user account is +implicit in the uid, then this vestigial parameter will remain in the +interface, making the interface more complex than is required. + +If the form of the additional parameter resembles the numeric or string +form of a uid, then this could even lead to security issues, for +instance if a component trusts the explicit user-account parameter and +ignores the actual uid. + +If this approach is taken, then we recommend reducing the confusion +caused by naming the additional parameter something more similar to +“profile†than “userâ€. If the system is later extended to have one uid +per user, rendering the parameter vestigial, we recommend giving it a +neutral, constant value that does not match any user account name, such +as “defaultâ€. + +#### One uid per user + +The traditional Unix design which motivated the uid concept is that each +user account is represented by one numeric uid. + +Because each process (i.e. each application or service) starts with a +particular uid, and processes without administrative privileges cannot +change their uid while running, this approach requires that +user-switching involves starting new processes for the new user. + +The major advantage of this approach is that it is how the existing +components in the system, including the Linux kernel, are designed to +operate. In particular, the Linux kernel provides privacy and integrity +protection between uids. + +We recommend this approach for Apertis. + +#### Multiple uids per user + +Android uses a design involving multiple uids per user, one per app or +set of related apps, as described in [][Android 4.2+]. This allows the Linux +kernel's privacy and integrity features to be used to protect apps from +other apps, even within a user session. However, in Apertis, this +advantage is redundant, since we already use a different kernel feature +(AppArmor) to provide privacy and integrity protection between apps. + +The major disadvantage of this approach is that it requires every +interaction between dissimilar apps to be mediated by a system-level +component. Within the context of Android, this is not a problem, since +Android applications and services are expected to use Android-specific +APIs in any case. However, Apertis re-uses existing open source +components where appropriate; these components would have to be modified +to cope with crossing privilege boundaries when they communicate with +different uids, which, as in the “one shared uid†approach, would reduce +the value of re-using these components. + +We do not recommend this approach for Apertis. + +### Creating and managing user accounts + +Based on the description of desired use case scenarios, Collabora +understands the main means of identifying and authenticating a user will +be through their own personal car key. This means a key with an unique +ID will have to be issued to each user of the car. + +Because most cars require the key to remain inserted while the car is in +use, if runtime user-switching is required, a secondary form of +authentication is likely to be required. This could be done via a +password (or equivalent, such as a PIN or touchscreen swipe pattern), +via biometrics such as fingerprint, face or voice recognition, or by +verifying possession of a near-field communication device such as a +mobile phone. + +As previously noted, depending on manufacturer and consumer +requirements, there is the possibility of simpler authentication schemes +for less privacy-conscious users; for instance, a manufacturer or +consumer could choose to relax the security model to one where a car key +is sufficient to authenticate as any registered user selected from a +menu. + +A registration process will be required, to associate authentication +tokens with user accounts: one way this could work is detailed in this +section. + +#### Registering the users + +After the car has been bought, the owner is provided with a number of +keys, one of each is handed to each user. Each user in turn will follow +the following procedure: + +1. User inserts the key and starts the vehicle + +2. The Apertis system starts up and recognizes that the key is + unregistered + +3. A wizard is displayed to register the new user + +4. The user enters whatever information is needed to set up their user + account, such as their name + +5. The user is given the option of registering a password or other + authentication tokens to be used for keyless authentication (for + user switching, mainly) + +6. Alternatively the wizard can continue from here on to register email + and web accounts the user may be interested in + +In case there are more users than keys available, new keys will need to +be acquired. + +#### The first user to be registered is special + +It's important that at least one user be able to perform administrative +tasks, such as wiping out all of the data, removing users, and so on. +One practical solution to this is that the first user to be registered +is considered special and be able to perform these tasks and is also +able to give these privileges to other users as they see fit, so that +more users would be able to perform administrative tasks. + +One analogy used in the security literature is that the system +“imprints†on the first user seen, in the same way that a duckling +imprints on its parent. A refinement of this model is that deleting all +users resets the system to a state in which the next user created will +be privileged, the so-called “[resurrecting duckling]†model. + + +> Frank Stajano and Ross Anderson. *The Resurrecting Duckling: +> Security Issues for Ad-hoc Wireless Networks*. In B. Christianson, +> B. Crispo and M. Roe (Eds.). *Security Protocols, 7th International +> Workshop Proceedings*, Lecture Notes in Computer Science, 1999. + +#### Premium segment considerations + +Markets which are targeted by Apertis system will be segmented. Upper +segment cars do not necessarily require the key to be kept in the +ignition while the car is on. For those kinds of systems, the system +could use key proximity as authentication factor, so it would allow +login for all users whose keys are in the car. + +#### Possible trade-offs and their consequences + +As discussed previously, the authentication system is one of the +problematic areas that might need trade-offs. The main means of +authentication being considered at the moment is the car key owned by a +user. + +The fact that most cars require the key to remain in the ignition barrel +to keep the car working makes it impossible for a different user to log +in. This indicates the need for an alternate authentication method, such +as a password, which would probably need to be registered with the +system when the users first register themselves by using the key. + +Should that solution be deemed not good enough, then disallowing user +switching at runtime will be considered, requiring the car to be turned +off and on with a different key for logging in with another user. + +### Graphical user interface and input + +As of May 2015, the graphics layer of Apertis is based on the Mutter +window manager/compositor, with an Apertis plugin added to provide the +desired UX, all running on the X display server. However, the intention +is to migrate from X to Wayland for display access in the near future. +The X Window System was designed for the more trusting environment of +1980s academic computing, and does not provide an effective security +boundary between applications (for example, applications can eavesdrop +on other applications' input events and output frames); in the context +of a multi-user system which might require differently-privileged +windows to share a display, this is a compelling reason to prefer +Wayland. + +This section explores several potential models for managing input and +output. + +The basic infrastructure component for Wayland is a *compositor*, which +is responsible for mapping application-supplied surfaces (windows) into +the visible display, routing input events to those surfaces, and +applying any visual effect with a larger scope than an individual +application, such as animated transitions between applications. + +In the current design proposal for switching single-user Apertis to +Wayland, the compositor is a Wayland version of Mutter, with a version +of the Apertis UX plugin that has similarly been adapted for Wayland; +this design is analogous to GNOME 3's Shell, which also uses the Mutter +libraries for window management and compositing under either X or +Wayland. One alternative that has been considered is to use the +Wayland-specific Weston compositor instead of Mutter, again with a +plugin or extension to provide the desired UX. From the perspective of +this document, either Mutter or Weston is viable, and neither is +preferred over the other from a multi-user perspective. + +The Wayland compositor is part of the TCB for security between apps: it +is responsible for imposing a boundary between the apps that communicate +with it, and preventing them from carrying out undesired actions such as +reading each other's input or taking screenshots of each other's +windows. Depending on the design and implementation, it may also need to +be part of the TCB for security between users. + +#### Single compositor + +One possible model is to have a single compositor which starts on boot, +runs until shutdown, and is directly responsible for compositing all +application surfaces. This model would be appropriate if there is only +one uid shared by all users as described in section [][Sharing one uid between all users], since in that +model there is no OS-level isolation between user accounts in any case. +It could potentially also be used in a design where each user has their +own uid, by running the compositor with a non-user-specific uid. + +The major disadvantage of this situation is that it places the +user-level compositor into a trusted position: it would become part of +the trusted computing base for separation between users (see [][Trusted components]). +Mutter is not typically used like this, and has not been designed +or audited for this use. Other compositors would need to be carefully +checked for safety for this use. As a general design principle, the less +code is in the trusted computing base (for any given layer of security), +the better; this conflicts with the user-level compositor's broad role +in mediating between apps, including animated transitions, copy/paste +functionality, on-screen keyboard handling and so on. + +#### Nested compositors + +Another possible approach is to make use of *nested compositors*. In +this model, a *system compositor* starts on boot, runs until shutdown, +and is responsible for compositing surfaces provided by system-level +components. Instead of surfaces supplied by applications, the system +compositor would primarily be responsible for compositing surfaces +supplied by one or more *session compositors*, and routing input events +to an appropriate session compositor: in effect, it treats the session +compositors like ordinary applications. + +The system compositor would run under a system (non-user-specific) uid, +while the session compositors would run under an appropriate uid for +their respective users. + +We do not recommend this approach. This design was suggested during +early upstream design work on Wayland, but is now strongly discouraged +by Wayland developers. One major issue is in dealing with input events. +Mediating every input event through two layers of compositor would +increase latency, limiting responsiveness, so it is desirable to grant +user sessions direct access to input events; but granting direct access +to session compositors nested inside a system compositor is problematic, +and would cause conflicts between the roles of the system compositor and +the systemd-logind service. + +Another reason to prefer other models is the increased complexity of the +system as a whole in this model. + +#### Switching between compositors + +The traditional design for user-switching in X, as described in +[][Basic multi-user: log out, log in as another user] and +[][“Fast user switchingâ€: switching user without logging out], +is to start a new X server for each user session and +switch between them, for instance by using the Linux kernel's “virtual +console†facility, or by dynamically attaching/detaching the X servers +to the video device. It would be possible to do the equivalent in a +Wayland environment, by running multiple session compositors, switching +access to the video output between them, and not having a system +compositor. + +In this model, the transition between users would involve systemd-logind +revoking the old session compositor's control over the display (“DRM +master†status) and over input devices, and giving control to the new +session compositor. This could be done at any point in the transition: +before, after or during an animated transition. + +The major disadvantage of this design is that switching between virtual +consoles is an all-or-nothing operation: the system either displays a +frame from one compositor or a frame from another, but it cannot combine +two (for instance by overlaying them, with transparent regions). It is +also not instantaneous, and would have to be disguised by having a +transition where several consecutive frames are allowed to be the same. + +For some UX designs, this would not matter. For example, if a designer +specifies that the first user's session should “fade out†to a black +screen or some sort of “please wait...†placeholder, or move off-screen, +then the system could switch to a matching frame in the new compositor, +wait for the switch to occur, and have the second user's session “fade +in†or move in from off-screen. Similarly, if the UX for user-switching +involves a menu from which the new user is chosen, then that menu could +be used as a fixed point around which to anchor the transition. + +However, if the desired transition has the two users' sessions overlap – +for instance, a full-screen cross-fade from one to the other, or any +animated movement that has both sessions exist on-screen at the same +time – then it would be difficult to achieve these effects in this +design without essentially copying a static screen-capture of one +session into the other session. Similarly, if the desired transition has +smooth movement from beginning to end – for example, smooth horizontal +scrolling with the conceptual model that the other user's session is +“just off-screen†– then the only practical points at which to do the +virtual console switch would be at the very beginning or at the very +end; either way, this would likely result in a few frames of +non-responsiveness at a time when the user might reasonably expect the +system to be responsive. + +Copying a screen-capture of one session into the other session is also a +potential privacy risk, since it results in the screen contents crossing +the trust boundary: it would be technically possible for the second +user's session to save the captured image. + +#### Switching between compositors with a system compositor + +Because Wayland does not require clearing the framebuffer during +switching, another possible approach would be to use a system-level +compositor without nesting, used for transitions, and optionally for +startup and shutdown. At any given moment, either the system-level +compositor or a session compositor would be active (have control over +input and output), but never both. + +In this model, as in [][Switching between compositors], +the transition between users would involve +systemd-logind revoking the old session compositor's control over the +display (“DRM master†status) and over input devices; however, instead +of immediately giving control to the other session, instead it would +give control to a special-purpose system-level compositor which would +perform the transition, and then in turn hand over to the new session. +This system-level compositor could capture the current screen contents +as a starting point for the animated transition, if desired; as in +[][Switching between compositors], the screen contents would cross a privilege boundary, but unlike +[][Switching between compositors], the other side of the privilege boundary in this design is a +trusted process. + +The new session compositor could be started without direct access to the +display (it would not yet be the “DRM masterâ€), and instructed to draw +its initial state into a buffer; recent Linux kernel enhancements mean +that it could use in-GPU processing and memory for this drawing +operation, without having control over what is displayed. The +system-level compositor would use that buffer as the endpoint of its +animated transition. On completing the transition, it would instruct +systemd-logind to grant full display and input access to the new session +compositor. + +As a result of its role in user-switching, the system-level compositor +used for the transition would potentially be part of the TCB for +security between users. However, its functionality would be minimal: +because it would not be active during normal use, only during +transitions, it would not necessarily need to process input at all, and +its output handling would be limited to performing the animation from +the old to the new screen contents. + +### Switching between users + +If runtime switching between users is required, there is a spectrum of +possible approaches. + +At one extreme is the simplest form of the approach described in section +4.2.1, where we terminate all of the newly inactive user's apps and user +services (anything that is user-specific), and only non-user-specific +processes (system services) continue to run. That has the lowest +possible memory and CPU overhead: there is going to be a small amount of +overhead during the necessary "grace period" while we let the inactive +user's apps save their state before killing them, but this is minimized. + +At the opposite extreme is the “fast user switching†as described in +section 4.2.2, in which the inactive user's entire session, including +GUI apps, user services, games, and infrastructure components such as +the window manager and X server (or session compositor) continue to run, +with the only difference being that they are disconnected from the input +and display hardware. That has considerable overhead: in the worst case, +where we assume that system services are negligible when compared with +per-user components, switching between two users could double the memory +and CPU consumption. + +We can choose various points along that spectrum depending on OEM and +customer requirements. If we can terminate all of the inactive user's +apps and the majority of their user services, the result is close to the +first extreme – for example, this could be based on an “agents continue +to run across user-switching†flag in the app manifest, perhaps +implemented as an Android-style “permissionâ€. App-store curators could +carry out more thorough validation on services that request that flag, +to ensure that they will not have an adverse performance impact. + +If we can terminate all of their apps but must leave *all* of their user +services running, we get closer to the second extreme. The closer we are +to the second extreme, the higher our hardware requirements for a given +performance level will be. + +If we terminate at least some of the newly inactive user's processes, a +second axis of variation is how much overlap we are prepared to tolerate +between the sessions: to allow those processes to save their current +state, a “grace period†will be required between notifying those +processes that they must exit, and actually terminating them. + +One approach is to disallow overlap entirely, and not start the +transition until the inactive user's session has completely ended, with +a “please wait...†message while their processes shut down. However, +this maximizes latency and user-visible disruption. To reduce the time +required to switch between users, it might be desirable for these +processes to continue to run concurrently for a short time, in parallel +with starting the newly active user's session. There is a trade-off +here: the more CPU time is consumed by the newly inactive user's +processes, the less is available to display a smooth animated transition +to the newly active user and launch *their* processes. This could be +mitigated by de-prioritizing the CPU and bandwidth consumption of the +inactive user's apps, at the cost of extending the necessary “grace +period†for a given amount of state-saving activity: for example, if an +app's state-saving procedure would normally take 50% of the CPU for 0.1 +seconds, throttling that app to 5% of the CPU would make its shutdown +take 1 second. + +### Preserving “core†functionality across user-switching + +If user-switching during use is supported, then certain features of the +system must continue to work during and after the user switching +operation. + +For example, navigation-related notifications (notifying the driver that +they should turn off their current route soon, that the speed limit will +change soon, etc.) are time-sensitive, and it would be reasonable to +require that these notifications are not interrupted or delayed, even if +user switching takes place just before or even during the notification. + +Further examples of background features that might be in the category +that must not be interrupted include media playback (if the driver is +listening to music, it would be reasonable to require that playback is +not stopped or disrupted by user switching, although interrupting “now +playing...†notifications might still be acceptable) and incoming phone +or VoIP calls. + +These features cannot be assumed to be a fixed part of the operating +system: for example, it should be possible to have uninterrupted media +playback via a third-party audio streaming app, such as one for last.fm +or Spotify, or uninterrupted VoIP call notifications for a third-party +VoIP implementation. + +Conversely, essential operating system features such as preinstalled or +non-removable apps are not necessarily all in the category of features +that must continue to work during user-switching: for example, incoming +email notifications are less time-critical than calls, and it is likely +to be acceptable for them to be paused during user-switching. + +There are several possible approaches to keeping these features working +across a user-switch. Depending on the concrete requirements and use +cases, we could choose one of these approaches for the whole system, or +choose some combination of them for different apps and services. + +As mentioned briefly above, there is the potential for a subtle +distinction between components where an interruption to notifications is +unacceptable (for instance, navigation or incoming calls might be in +this category), and components where an interruption to functionality is +unacceptable, but an interruption to notifications is allowed (or even +desirable). + +For a possible example of the second category, consider music playback, +on a system where a visual notification is triggered when the current +track changes. Suppose we switch the current user from Alice to Bob at +12:00:00, at which time track 1 is 2 seconds from ending, and the +animated transition takes 4 seconds. It seems reasonable to expect that +track 1 must continue to play until 12:00:02, and it also seems +reasonable to expect that track 2 must start at 12:00:02 and continue to +play smoothly. However, it is not necessarily a requirement that the +“now playing track 2†notification cannot be delayed until Bob's +session becomes fully available at 12:00:04; indeed, this might be +considered more desirable than having it interrupt the animated +transition. + +#### System services + +System services (as defined by [][System services]) continue to run regardless +of what is happening in user sessions, so one possible approach is to +put “core†functionality in system services. These could be anywhere +from highly privileged to entirely unprivileged; the distinction here is +only that they are independent of user accounts. + +For example, network management services such as ConnMan are +highly-privileged system services, whereas the Avahi name-resolution and +service discovery service is system-wide but unprivileged. + +If this approach is to be used for third-party installable applications, +then we will need to ensure that third-party application bundles can +provide system services, in a way that does not allow those third-party +application bundles to compromise the overall security of the system. + +For components that deal with user-specific data, making the component +into a system service requires that the component is trusted to provide +the correct privilege separation: for example, if the component has +access to multiple users' private data, it should not reveal one user's +private data to another user unless the system's security model allows +this to happen. + +As a general design principle to avoid circular dependencies and +unnecessarily tightly-coupled components, lower layers should not rely +on higher layers. System services are at a low layer in the stack, so +they should not initiate communication with user services or users' +graphical sessions. One common approach to this is to have a component +inside each user session whose role is to provide the user interface for +a “headless†system service, separating backend logic and system-level +configuration (the system service) from user interface presentation and +per-user configuration (the user part). + +#### User services continuing to run + +User services (as defined by [][User services]) are inherently per-user. If +the end of a user's login session terminates their GUI applications but +leaves some or all of their user services running, this could increase +system load (as noted in section [][Switching between users]), but would make user services a +suitable implementation for features that must run uninterrupted. This +could apply either in general, or with restrictions (for example, some +subset of the inactive user's user-services could continue to run, +perhaps according to a “flag†in their associated app manifests). + +#### Distinguishing between the driver and other users + +Because the driver is the primary user of the system, one possible +refinement of this requirement would be to say that core functionality +associated with the driver cannot be interrupted, and must retain its +ability to display notifications, but that switching may interrupt +functionality associated with other users. This would limit the +additional system load from multiple users: the maximum set of processes +running at a given time would be one non-driver's full session, plus +whatever subset of the driver's processes are considered to be +necessary. + +#### Agents + +The Apertis design has the concept of “agentsâ€, which are lightweight +background processes running on behalf of a user. Depending on the +precise requirements for agents, they could be implemented as system +services, or as user services, or divided between those two categories. + +### Returning to previous state + +Saving and restoring the state of the session is a hard problem in +general. Some platforms, such as Android, made it a central piece of +their application life cycle management and built it right into the +application support for the platform. The fact that Android and iOS have +custom platform layers allows them to make this viable. + +Collabora is not aware of any deployment of OS-level freezing and +thawing of processes at the moment, but such a strategy could be +investigated in the future for usage in Apertis. For now, having the +application itself care about saving and restoring state, even if +supported by some high level API, seems to be the more realistic +approach. More discussion about this can be found in the +[Applications design] document. + +### Application ownership and installation + +In current app-store platforms such as Apple, Google Play, Steam or +PlayStation Store, if you buy an app, it is associated with your +personal account (Apple, Google, etc.) and can be downloaded to any +device associated with that account, subject to some limits. This is one +possible approach to how apps are deployed on Apertis. + +To avoid wasting space with duplicate application installations, current +app-store implementations with multi-user support, such as Android, have +chosen to install applications system-wide. If Apertis apps are, +conceptually, installed per-user, then we recommend implementing this by +keeping a list of apps per user, and merely hiding apps from users who +have not “installed†that app. If the user acquires an app that another +user has already installed, the system could behave as though it was +freshly downloaded, but in fact just stop hiding the system-wide app +from the current user: from the user's perspective, this is +indistinguishable from a very fast download and installation. + +Another potential conceptual model is to treat apps as more like car +accessories. You could, for instance, buy a car with metallic paint, or +add alloy wheels later; when you sell the car, the feature goes with it. +Applying this model to applications, it could be possible to buy a car +with the social media app bundle preinstalled, or add the media +streaming bundle later, and have the apps go with the car when it is +sold. In some respects, this is the more natural model from the +implementation point of view: we do not recommend duplicating the app's +executable code and resources, regardless of whether it is conceptually +installed per-user. + +Whichever of these approaches is taken, choosing whether +ownership/licensing of the app follows the car or the purchaser is +primarily a matter for the app store implementation, not the multi-user +design. + +## Summary of recommendations + +As discussed in [][User accounts representing users within the system], +Collabora recommends representing each user +account as a Unix user ID (uid). The first user to be registered in a +new system must be able to perform administration tasks such as system +updates, application installation, creation of new users and setting up +permissions – that is discussed in [][Creating and managing user accounts]. + +There is a range of possible approaches to switching between users, +discussed in section [][Switching between users]. This document does not recommend a particular +choice from that range, since it depends on the available hardware +resources and the system's use-cases and requirements. For +budget-limited designs with significant hardware limitations, we should +consider terminating most user-level processes while switching to reduce +concurrency, or if this is not acceptable, opt to leave user-switching +unsupported; for premium models with more capable hardware, the more +resource-expensive “fast user switching†approach can be considered. + +In [][Preserving “core†functionality across user-switching] +we outline various possible approaches to ensuring that +“core functionality†is not interrupted by a user switch. Services +that need to stay running after a user switch should have their +background functionality split from their UIs; they can either run as a +different Unix user account ID – a “system service†– or be a specially +flagged “user service†that is not terminated with the rest of the +session. + +In [][Returning to previous state], Collabora recommends that applications should be +handling saving and restoring of their state themselves, potentially +supported by helper SDK APIs, which means only applications written with +Apertis in mind would work. That recommendation comes from the fact that +there is no solution that would work for all applications. + +Ways of having a smooth visual transition when switching users are +discussed in section [][Graphical user interface and input]. Collabora recommends revisiting this topic +after Apertis' graphical user interface and input processing has been +switched from X to Wayland; our provisional recommendation is to +implement a hand-off procedure between compositors running under the +appropriate user ID, either with ([][Switching between compositors with a system compositor] or without (section +[][Switching between compositors]) an intermediate switch to a system compositor. + +[Android 4.2]: http://developer.android.com/about/versions/jelly-bean.html#android-42 + +[fosdem-conf]: https://fosdem.org/2015/schedule/event/embedded_multiuser/ + +[resurrecting duckling]: https://www.cl.cam.ac.uk/~fms27/duckling/ + +[Applications design]: https://wiki.apertis.org/mediawiki/index.php/ConceptDesigns diff --git a/content/designs/op-tee.md b/content/designs/op-tee.md new file mode 100644 index 0000000000000000000000000000000000000000..a43d6eb5ca45e8a67317b0b8190b7c8bb03c19ba --- /dev/null +++ b/content/designs/op-tee.md @@ -0,0 +1,194 @@ +--- +title: OP-TEE integration +short-description: Discussing and detailing an approach to the integration of OP-TEE as a Trusted execution environment +authors: + - name: Martyn Welch +--- + +# Integration of OP-TEE in Apertis + +Some projects that wish to use Apertis have a requirement for strong security measures to be available in order to implement key system level functionality. +A typical use case is enabling the decryption of protected content in such a way that doesn't allow the owner of the device doing the decryption to access the decryption keys. +Another use for strong security is the protection of authentication keys. +By shielding such keys within these strong security measures, it becomes much harder for the keys to be stolen and be used to impersonate the legitimate user. + + + +In the above example, when requesting access to the cloud service, the service returns a challenge response, which needs to be signed using [asymmetric cryptography](https://en.wikipedia.org/wiki/Public-key_cryptography). +The Apertis application requests that functionality in the secure environment sign the challenge using a private key that it stores securely. +The signed challenge is then returned to the cloud service, which checks the validity of the signature using the public key that it holds to authenticate the user. +Such systems may additionally require the state of the system to be verified (typically by building a [chain of trust](https://en.wikipedia.org/wiki/Chain_of_trust)) before use of the secure keys is allowed, thus ensuring the device hasn't been altered in ways which may compromise protection of the keys. + +Whilst a system could be architected to utilise a separate processor to perform such tasks, this significantly drives up system complexity and cost. +Some platforms provide a mechanism to enable a secure, trusted environment or "[Trusted Execution Environment](https://en.wikipedia.org/wiki/Trusted_execution_environment)" (TEE) to be setup. +A TEE runs on the application processor, but with mechanisms in place to isolate the code or data of the two running systems (the TEE and the main OS) from each other. +ARM provides an implementation of such security mechanisms, known as [ARM TrustZone](https://developer.arm.com/ip-products/security-ip/trustzone), mainly on Cortex-A processors. + +## System Architecture + + + +A TEE exists as a separate environment running in parallel with the main operating system. +At boot, both of these environments need to be loaded and initialised, this is achieved by running special boot firmware which enables the TrustZone security features and loads the required software elements. +When enabled, a "secure monitor" runs in the highest privilege level provided by the processor. +The secure monitor supports switching between the trusted and untrusted environments and enabling messages to be passed from one environment to the other. +ARM provide a reference secure monitor as part of the [ARM Trusted Firmware](https://github.com/ARM-software/arm-trusted-firmware) (ATF) project. +The ATF secure monitor provides an API to enable the development of trusted operating systems to run within the trusted environment, one such trusted OS is the open source [OP-TEE](https://www.op-tee.org/). +OP-TEE provides a trusted environment which can run [Trusted Applications](#trusted-applications) (TAs), which are written against the TEE internal API. + +As well as starting up a trusted OS in the trusted environment, ATF typically starts a standard OS such as Linux on the untrusted side, known as the rich operating system or "Rich Execution Environment" (REE), by running the firmware normally used for this OS. +It is necessary for the OS to have drivers capable of interfacing with the secure monitor and that understands how to format messages for the trusted OS used on the trusted side. +Linux contains a [TEE subsystem](https://www.kernel.org/doc/Documentation/tee.txt) which provides a standardised way to communicate with TEE environments. +The OP-TEE project have upstreamed a [driver](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/tee/optee) to this subsystem to enable communications with the OP-TEE trusted environment. + +OP-TEE relies on the REE to provide a number of remote services, such as file system access, as it does not have drivers for this functionality itself. +The OP-TEE project provides a Linux user space [supplicant daemon](https://github.com/OP-TEE/optee_client) which supplies the services required by the trusted environment. +A library is also provided which implements a standardised mechanism, documented in the [GlobalPlatform TEE Client API Specification v1.0](https://globalplatform.org/specs-library/tee-client-api-specification/), for communicating with the TEE. +It is expected for this library to be used by applications needing to communicate with the TAs. + +## Boot Process + +From a high level, the basic change required to the boot process is that the TEE need to be setup before the REE. +The factor missing from this description is security. +In order for the TEE to be able to achieve it's stated goal, providing a secure environment, it is necessary for the boot process to be able to guarantee that at least the setup of the TEE has not been tampered with. +Such guarantees are provided by enabling secure boot for the relevant platform. + +The process used to perform a secure boot is dependent on the mechanisms provided by the platform which vary from vendor to vendor. +Typically it requires the boot process to be locked down to boot from known storage (such as a specific flash device) and for the boot binaries to be signed so that they can be verified at boot. +The keys used for verification are usually read-only and held in fuses within the SoC. + +The signed binaries comprise a series of bootloaders which progressively bring up the system, each being able to perform a bit more of the process utilising support enabled by earlier bootloaders. +This series of bootloaders will load the secure monitor (known as `EL3 Runtime Software` in this context), OP-TEE (the `Secure-EL1 Payload`) and finally U-Boot (the `Non-trusted Firmware`), which loads Linux. + +The ARMv8 architecture provides 4 privilege levels. +The lowest privilege level, PL0, is used for executing user code under an OS or hypervisor. +The next level, PL1, is used for running an OS like Linux, with PL2 above it available to run a hypervisor. +The highest level PL3 is used for the secure monitor. + +A more in-depth description of the boot process can be found in the [OP-TEE documentation](https://trustedfirmware-a.readthedocs.io/en/latest/design/firmware-design.html). + +## Trusted Applications + +Trusted Applications (TAs) are applications that run within the trusted environment, on top of OP-TEE. +Trusted Applications are used to provide the secured services and functionality that is needed in the platform. +The TAs are identified by a UUID and are usually loaded from a file stored in the untrusted file system named after the UUID. +In order to ensure the TAs haven't been tampered with they are signed. +If the contents of the TA should remain protected, there are options for storing it encrypted for further protection. +Alternatively, if a TA is required before the tee-supplicant is running (and hence able to access the TA from the file system), TAs can also be built into the firmware as an early TA. +A more in-depth description of TA implementation can be found in the [OP-TEE documentation](https://optee.readthedocs.io/en/latest/architecture/trusted_applications.html). + +The OP-TEE project provides a number of [TA examples](https://github.com/linaro-swg/optee_examples). + +Trusted Applications provide immense flexibility in the functionality that can be provided from the TEE environment. +This flexibility is such that a proof of concept has been completed implementing a [TPM 2.0 implementation](https://github.com/Microsoft/ms-tpm-20-ref) that can be [used in OP-TEE](https://github.com/jbech-linaro/manifest/tree/ftpm). + +## Virtualisation Support + +As the hypervisor and secure monitor each have a separate privilege level, it is possible for the TEE to co-exist with systems running a hypervisor. +OP-TEE currently has [experimental support](https://optee.readthedocs.io/en/latest/architecture/virtualization.html) for the XEN hypervisor running on an emulated ARMv8 system. +The current approach provides a separate context for each of the Virtual Machines (VMs) running on the hypervisor. +This requires support from the hypervisor to enable communication between the Virtual Machines (VM) running on the hypervisor and the TEE and to ensure the TEE is using the context associated with the calling VM. +The experimental support currently disables access to hardware resources, such as cryptographic engines, in the TEE as a mechanism to share such resources safely between the separate TEE contexts has not yet been created. + +# Enabling TEE in Apertis + +Apertis does not provide the vast majority of the functionality needed to implement a TEE. +A number of steps need to be taken in order to enable TEE support in Apertis. + +## Secure Boot + +Secure boot provides an initial important step in initialisation of the TEE by ensuring that the initialisation process is able to proceed without interference. +Unfortunately this fundamental step is very platform dependent and can not be solved as a general case. +Apertis has already taken steps to [document and demonstrate secure boot](secure-boot.md). +At the moment, Apertis only ships some support for secure on the SABRE Lite platform. This provides a good reference for the overall process but, unfortunately, the SABRE Lite is not a good choice as a technology demonstrator for TEE due to its age. + +We advise the implementation of a TEE demonstrator on a more modern platform to take advantage of the more advanced functionality found in such platforms. +This will be covered in more detail in our recommendations for the [next steps](#next-steps). + +In addition to the board verifying the initial binaries that are executed, it is important that the verification of binaries continues through the boot process in order to build a [chain of trust](https://en.wikipedia.org/wiki/Chain_of_trust) so that later stages can determine whether boot was carried out appropriately. + +## ARM Trusted Firmware + +The current ARM Trusted Firmware package in Debian does not build for any platforms currently supported in Apertis. +The package will need to be tweaked to sign the ATF binaries using an Apertis key. +In order to support ATF in Apertis, one of the following options will need to be taken: + +- Adopt a platform already supported by the build as an additional platform in Apertis +- Enable support for a platform supported by ATF but not currently built by the deb packaging +- Add support for a preferred platform to ATF and enable it in the packaging + +From the perspective of enabling ATF, these are broadly in order of effort, though clearly adding an additional platform to Apertis increases the effort for ongoing baseline maintenance. + +### Requirements + +In order to implement [Trusted Board Boot](https://trustedfirmware-a.readthedocs.io/en/latest/design/trusted-board-boot.html) it will be necessary to upgrade `mbedtls`. +This functionality is likely to be considered critical by project developers. + +## OP-TEE OS + +The OP-TEE project provides the [OP-TEE OS](https://github.com/OP-TEE/optee_os) as the trusted OS that runs in the TEE. +This is not currently packaged for Debian and it would need to be to incorporated into Apertis. +Like ATF, an Apertis key will need to be used to sign the binaries intended for the TEE to ensure the chain of trust. +Currently when OP-TEE is built, it embeds the public key that will be used for verifying TAs. +As with the key/keys used in other steps of this process, in order to ensure that products are properly secured, would be necessary for product teams to at a minimum replace the key used with a product specific one. +A product team may wish to modify OP-TEE to support alternative key management solutions, this is [expected by the OP-TEE developers](https://github.com/OP-TEE/optee_os/issues/2233#issuecomment-379253182). + +In addition to the trusted OS, the build of the OP-TEE OS source also builds the TA-devkit. +The TA-devkit provides the resources necessary to both build and sign TAs. +The TA-devkit will need to be packaged so that it can be provided as a build dependency for any TAs. + +## Linux Kernel + +Debian (and thus the Apertis config) does already enable the TEE subsystem on arm64 where ATF can be used. +It is understood that this should be sufficient and thus no extra modifications to the kernel will be required. + +## OP-TEE Supplicant and User Space Libraries + +In addition to the trusted OS, the OP-TEE project provides the [OP-TEE supplicant and TEE Client API](https://github.com/OP-TEE/optee_client). +The supplicant provides services to OP-TEE that it does not directly provide itself and the TEE Client API provides a user space API in the REE to communicate with the TEE. +As with the OP-TEE OS, these components are currently not packaged for Debian and would need to be. +As these components run in the REE they don't need to be signed. + +## Sample TAs + +To enable early investigation of TEEs on Apertis, the [example TAs](https://github.com/linaro-swg/optee_examples) should be packaged. +For simple use cases, it may be that these examples either fulfil or provide a framework for development of the TEE requirements. +They will provide a useful reference of how to package TAs for Apertis even for the use cases that are not covered by the examples. + +The sample TAs will be signed with the key provided by the Apertis TA-devkit package (which will be a build dependency) and thus will be usable with the OP-TEE OS built for Apertis. + +## Test Suite + +A [test suite](https://github.com/OP-TEE/optee_test) exists for OP-TEE. +Providing this in Apertis would enable developers to gain some confidence that OP-TEE was installed and initialised correctly. + +## Debos Scripting + +Once components are added to the Apertis project, we need a way to combine them into an image that can be booted on the target platform. +In Apertis this is performed by Debos using configuration files to determine exactly what packages are added to each image. +This also allows for the images to be built automatically and regularly using the latest versions of packages. +A special image to automate configuration of the boot process can also be generated like the one provided to update the U-Boot bootloader for the [i.MX6 SABRE Lite board](https://gitlab.apertis.org/infrastructure/apertis-image-recipes/-/blob/apertis/v2021dev2/mx6qsabrelite-uboot-installer.yaml). + +# Next Steps + +Integrating OP-TEE into Apertis substantially alters the boot process and requires secure boot to be working effectively to be valuable. + +Whilst the research carried out to write this proposal has attempted to consider the impacts of adding this support to Apertis, there remains a risk that some potential issues have gone unnoticed. +We therefore advise following up this document with adding the support to Apertis for at least one reference platform so that the basic components are formally integrated into Apertis; to provide as a solid reference for product teams and further lower the risk of Apertis adoption for teams wishing to use OP-TEE. + +## Choice of Reference Platform + +The OP-TEE project is specifically targeted towards the ARM ecosystem, specifically those that provide ARM TrustZone. +ARM TrustZone has been improved in later iterations of the technology and standardised with a reference implementation available to for using TEEs as part of the ATF project. +We recommend that a platform that is capable of utilising ATF is chosen for this reference. +An advantage of implementing the TEE using ATF is that this provides a standardised interface for the trusted OS and thus allows Apertis to potentially be used with alternative trusted OS implementations. + +## Test Integration + +The availability of a test suite provides some coverage of the OP-TEE functionality with minimal effort as this should be usable from automated testing. + +Whilst the test suite will test operation of OP-TEE itself, an important part of initialising a TEE is the platform specific secure boot. +Unless using a platform very closely aligned with an Apertis reference platform, this step will be the responsibility of the product team. +To ensure that this is properly implemented, tests could be developed that attempt to utilise incorrectly signed binaries at the different stages of the boot process to ensure that each step is properly validated, providing a reference for how to test secure boot. + +Experience with the SABRE Lite has shown that whilst devices may be set up to emulate a secured configuration, their behavior differs from the behavior of devices locked via its embedded fuses. Since boards locked in a secure boot configuration no longer allow some operations, they become less useful for general development. For this reason, a dedicated set of boards locked via fuses may be required to fully test that secure boot restrictions are being enforced. diff --git a/content/designs/permissions.md b/content/designs/permissions.md new file mode 100644 index 0000000000000000000000000000000000000000..759557bb93a5da9ccd97918456ee9fc8b4003f65 --- /dev/null +++ b/content/designs/permissions.md @@ -0,0 +1,1209 @@ +--- +title: Permissions +short-description: Assigning security rules from declarative metadata + (unimplemented) +authors: + - name: Simon McVittie +--- + +# Permissions + +## Introduction + +This document extends the higher-level [Applications] and [Security] +design documents to go into more detail about the permissions framework. + +Applications can perform many functions on a variety of user data. They +may access interfaces that read data (such as contacts, network state, +or the users location), write data, or perform actions that can cost the +user money (like sending SMS). As an example, the [Android] operating +system has a comprehensive [manifest][Android manifest] +that govern access to a wide array of functionality. + +Some users may wish to have fine grained control over which applications +have access to specific device capabilities, and even those that don't +should likely be informed when an application has access to their data +and services. + +### Terminology + +[Integrity, confidentiality and availability] are defined in the +Security concept design. + +Discussions of the security implications of a use-case in this document +often mention the possibility of a *malicious* or *compromised* app-bundle. +For brevity, this should be understood to cover all situations where +malicious code might run with the privileges of a particular app-bundle, +including: + +* An app-bundle whose author or publisher deliberately included malicious + code in the released version (a Trojan horse) +* An app-bundle whose author or publisher accidentally included malicious code + in the released version, for example by using a maliciously altered compiler + like [XcodeGhost] +* An app-bundle where there is no directly malicious code in the + released version, but there is a security vulnerability that an attacker + can exploit to run malicious code of their choice with the privileges + of the app-bundle + +### Scope of this document + +This document aims to define a general approach to permissions, +so that future work on a particular feature that requires permissions +only requires the designer of that feature to define a permission or +a series of permissions, and does not require the designer of the +feature to design the entire permissions framework. + +This document also aims to define permissions for basic features that +are already present in Apertis and already well-understood. For example, +access to external and shared storage is in-scope. + +This document does not aim to define permissions for features that are +not already present in Apertis, or that are not already well-understood. +For example, defining detailed permissions for [egress filtering] is +out of scope. + +The permissions framework is not intended to cover all the needs of built-in +application bundles. See [Non-use-cases] below. However, some classes of +built-in application bundles are anticipated to be implemented by every +Apertis vendor as a means of product differentiation, and these could +benefit from having a shortcut way to request particular permissions; +these use cases are marked below. + +## Use cases + +### Internet access + +A general-purpose Internet application like a web browser might require +full, unfiltered Internet access, including HTTP, HTTPS, DNS, WebSockets +and other protocols. + +A podcast player might require the ability to download arbitrary files +via HTTP using a service like the Apertis Newport download manager, but +might not require any other Internet access. + +A simple game might not require any network access at all, or might +only require the indirect network access (launching URIs) that can be +obtained by communicating with the Didcot [content handover] service. + +Many intermediate levels of Internet access are possible, but for the +purposes of this document we do not consider them. See the +[Egress filtering design notes on the Apertis wiki][Egress filtering] +for initial work on finer-grained control. + +#### Security implications + +An application with Internet access might be +compromised by malicious inputs from the Internet (integrity failure). +If an application cannot contact the Internet, we can be confident that it +cannot be subject to this, although it could still be compromised by +malicious content that is downloaded locally and passed to it via +[content handover]. + +If an application with Internet access is compromised either remotely or +by opening malicious local content, it could be induced to send private +data to an attacker-controlled server (a confidentiality failure). This +attack applies equally to applications with access to a download manager +like Newport, because the private data could be encoded in the URI +to be downloaded. If the download manager offers control over request +headers such as cookies or the HTTP `Referer`, the private data could +also be encoded in those. Applications that can request URIs via a +[content handover] service could also be susceptible to this attack, +but only if the content handover service will pass them to a handler +without user interaction, *and* the handler for those URIs will fetch +the URI without user interaction. + +If an application with Internet access is compromised, but does not +already contain malicious code to carry out actions of the attacker's +choice (the *payload*), a common technique is to download a payload +from a server controlled by the attacker. In particular, this allows +an attacker to alter the payload over time according to their current +requirements, for example to form a botnet that can be used for multiple +purposes. An application with access to a download manager like Newport +is equally susceptible to this, even if it cannot access the Internet +itself, because it can ask Newport to download the new payload from +the attacker's server. However, an application that can only use a +[content handover] service is not susceptible to this attack, because +such applications are not allowed to see the result of the HTTP request. + +#### In other systems + +In [Android], the web browser and podcast player described in the use-cases +above would have the `INTERNET` permission, but the game would not. Using +the Android DownloadManager service (equivalent to Newport) also requires +`INTERNET` permission, because it enables most of the same attacks as +direct Internet access. + +In [Flatpak], the web browser would have the `shared=network` permission +and the game would not. The game could still send requests to the +URI-opening [portal][portals]: assuming that only one web browser is +installed, the URI-opening portal would normally pass on HTTP and HTTPS +URIs to a web browser without user consent (with the result that the +browser makes `GET` requests to the appropriate web server), but prompt +the user before passing other URIs to the URI handler. The podcast player +could have `talk` access to a D-Bus service equivalent to Newport, but as +noted above, that would be essentially equivalent to arbitrary HTTP access +in any case. + +In [iOS], all app-bundles have Internet access. + +### Geolocation + +A navigation app-bundle needs to know the precise location of the vehicle, +but an app-bundle to suggest nearby restaurants might only need to know +the location within a few miles, and an e-book reader does not need to know +the location at all. + +#### Security implications + +The user's geographical location is sensitive information, especially +if it is precise, and is valuable to criminals. In some cases disclosing it +would be a threat to personal safety. + +#### In other systems + +In [Android] the navigation app-bundle would have `ACCESS_FINE_LOCATION`, +the restaurant guide would have `ACCESS_COARSE_LOCATION` and the +e-book reader would have neither. + +In [Flatpak], the user is asked for permission to use geolocation the +first time it is used, with the option to remember that permission +for all future requests. The app-bundle is not required to declare in +advance whether it might use geolocation. + +In [iOS], two forms of geolocation can be requested, by using +`NSLocationAlwaysUsageDescription` or `NSLocationWhenInUseUsageDescription`. + +### Initiating a phone call + +A contact management application might wish to initiate a phone call +without further user consent, for example when the user taps a phone +icon next to a contact. + +An application that is only tangentially related to phone calls, such as an +app-bundle to suggest nearby restaurants, might not wish to request permission +to do that. Instead, it could initiate a phone call by launching an appropriate +`tel:` URI, which would normally result in a built-in application or a platform +service popping up a call dialog with buttons to initiate the call or cancel +the transaction, the same as would happen on selecting a `tel:` link in a +web browser. + +An e-book reader does not need to initiate phone calls at all. + +#### Security implications + +If an app-bundle can initiate calls without user +consent, this will result in the user's microphone being connected to the +call recipient, which is a confidentiality (privacy) failure. Making +undesired calls can also cost the user money, and in particular a malicious +app author might place calls to a premium rate number that pays them. + +#### In other systems + +In [Android], the contact management app-bundle would have the `CALL_PHONE` +permission. The restaurant guide and the e-book reader would not, +but would still be able to launch an intent that results in the system +phone dialler app being shown, giving the user the opportunity to +confirm or cancel. + +In [Flatpak], the contact management app-bundle would have `talk` access +to a D-Bus service offering immediate phone dialling, for example +the Telepathy Account Manager, or to a group of services, for example +Telepathy. The restaurant guide and the e-book reader would not, but +would be able to launch a `tel:` URI, which would be handled in much +the same way as the Android intent. + +In [iOS], user-installable app bundles would presumably launch `tel:` +URIs. There does not appear to be a way for a non-platform-level component +to dial phone numbers directly. + +### Shared file storage + +A media player with a gallery-style user experience might require the +ability to read media files stored on external storage (a USB thumb drive +or externally-accessible SD card), or in a designated +[shared area][shared data]. + +Similarly, a media player might require access to media indexing and browsing +as described in the [Media Management concept design]. + +A podcast player might wish to store downloaded podcasts on external +storage devices or in the shared storage area so that media players can +access them. + +#### Security implications + +App-bundles with write access to this shared +storage can modify or delete media files; if this is done inappropriately, +that would be an availability or integrity failure. App-bundles with read +access can observe the media that the user consumes, which could be considered +privacy-sensitive; uncontrolled access would be a confidentiality failure. +Malicious app-bundles with write access could also write malformed media +files that were crafted to exploit security flaws in other app-bundles, +in the platform, or in other devices that will read the same external +storage device, leading to an integrity failure. + +#### In other systems + +In recent [Android], the `READ_EXTERNAL_STORAGE` permission is required +(the shared area on Android devices was traditionally a removable SD card, +leading to the name of the relevant permission and APIs, even though in +more recent devices it is typically on non-removable flash storage). In +older Android, that permission did not exist or was not enforced. + +Similarly, the `WRITE_EXTERNAL_STORAGE` permission governs writing; +that permission was always enforced, but is very widely requested. + +In [Flatpak], any directory of interest can be mapped into the filesystem +namespace of sandboxed processes, either read-only or read/write, +via the `filesystems` metadata field. Values like `xdg-music` and +`xdg-download/Podcasts` make common use cases relatively straightforward, +and provide considerably finer-grained control than in Android. + +In [iOS], access to media libraries is mediated by the +`NSAppleMusicUsageDescription` and `NSPhotoLibraryUsageDescription` +metadata fields. + +### Launcher + +This use-case is only applicable to built-in app-bundles. + +A vendor-specific application launcher, such as the [Mildenhall Launcher] in +the Apertis Mildenhall reference user interface, needs to list all the +application entry points on the system together with their metadata. +It also needs to launch those entry points on-demand. + +#### Security implications + +Holding this permission negates the Apertis platform's usual concept +of [application list privacy]: an app-bundle with this permission can +enumerate the entry points, which is valuable if an attacker wishes to +identify particular user (fingerprinting). If unintended app-bundles +gain this access, it is a confidentiality failure. + +#### In other systems + +[Android] does not appear to restrict the visibility of other app-bundles. + +[Flatpak] app-bundles can only observe the existence of other app-bundles +if their D-Bus filtering is configured to be able to `see` their +well-known names. + +[iOS] restricts the visibility of other app-bundles, although +[fingerprinting][iOS fingerprinting] +can be carried out by abusing inter-app communication. +Because iOS is a single-vendor system, the security mechanisms used by +platform components and by the equivalent of our built-in app bundles do not +have public documentation. + +### Settings + +This use-case is probably only applicable to built-in app-bundles. + +Suppose a vendor has a [system preferences application] that provides +an overview of all [system settings], [user settings] and [app settings], +such as the [Mildenhall Settings] application bundle in the Apertis +Mildenhall reference user interface. That application needs to list the +app settings belonging to all store and built-in app-bundles, and needs +the ability to change them, without prompting the user. + +#### Security implications + +Holding this permission negates the Apertis +platform's usual concept of [application list privacy], similar to the +[Launcher] use case. + +Unconstrained settings +changes are also very likely to allow arbitrary code execution with the +privileges of other components that trust those settings, which would +be a serious integrity failure if carried out by an attacker. + +#### In other systems + +In [Android], the `CHANGE_CONFIGURATION` permission grants the ability +to change system configuration in some limited ways, and the +`WRITE_SETTINGS` permission grants the ability to carry out +more settings changes. + +In [Flatpak] when used with GNOME, granting write access to `dconf` +(by making its files readable in the sandbox, and granting `talk` access +to the dconf service) gives unconstrained access to all settings. + +Because [iOS] is a single-vendor system, the security mechanisms used by +platform components and by the equivalent of our built-in app bundles do not +have public documentation. + +### Restricted subsets of settings + +A photo viewer might have an option to set a particular photo as "wallpaper". +A travel-related app-bundle might have an option to set the time zone, +and media player might have options to change audio parameters. +An e-book reader does not require the ability to do any of those. + +#### Security implications + +In general, these subsets of settings are chosen so that an attacker changing +them would be an annoyance rather than a serious integrity failure, mitigating +the attacks that are possible in the use-case above. However, the effect +of changing a setting is not always immediately obvious: for example, setting +untrusted images as wallpaper could lead to a more serious integrity failure +if there is an exploitable flaw in an image decoder used by the platform +component or built-in app-bundle that displays the wallpaper. + +#### In other systems + +In [Android], the photo viewer might have the `SET_WALLPAPER` and +`SET_WALLPAPER_HINTS` permissions, the travel-related app-bundle +might have `SET_TIME_ZONE`, and the media player might have +`MODIFY_AUDIO_SETTINGS`. + +[Flatpak] does not currently have portals for these, but a Flatpak app-bundle +could be given `talk` access to a D-Bus service that would allow these +actions. + +[iOS] does not appear to provide this functionality to third-party app-bundles. + +### Granting permission on first use + +The author of a hotel booking app-bundle includes a feature to locate nearby +hotels by using the Apertis geolocation API. Because +[users are more likely to grant permission to carry out privacy-sensitive +actions if they can understand why it is needed][Permissions on-demand], +the app author does not want the Apertis system to prompt for access +to the geolocation feature until the user actively uses that particular +feature. + +#### Not granting permission on first use + +Conversely, an automotive vendor wishes to minimize driver distraction in +order to maximize safety. When the same hotel booking app-bundle attempts +to use geolocation while the vehicle is in motion, the platform vendor +might want the Apertis system to **not** prompt for access to the +geolocation feature, contrary to the wishes of the app author. Instead, +the user should be given the opportunity to enable geolocation at a time +when it is safe to do so, either during app-bundle installation or as a +configuration/maintenance operation while the vehicle is stationary at +a later time. + +Note that those two use cases have contradictory expectations: this is a +user experience trade-off for which there is no single correct answer. + +#### In other systems + +[iOS] prompts for permission to carry out each privileged operation at +the time of first use. + +[Flatpak] mostly does the same, but with some pragmatic exceptions: +lower-level permissions, such as access to direct rendering devices for +3D games or direct access to the host filesystem, are implemented in a +way that precludes that model. These are set up at installation time, and +can be overridden by user configuration. When a Flatpak app is launched, +it is given the level of access that was appropriate at launch time. + +[Android] 6.0 and later has the same behaviour as iOS. +Older Android versions configured all permissions at installation time, +with a simple UX: the user must either accept all required permissions, +or abort installation of the app. Some permissions, notably access to +shared storage (the real or emulated SD card), were implemented in a way +that precluded runtime changes: app processes with access to shared storage +ran with one or more additional Unix group IDs, granting them DAC permission +to the appropriate areas of the filesystem. + +### Tightening control + +Suppose that Apertis version 1 allows all app-bundles to query the vehicle +model, but the Apertis developers later decide this is a privacy risk, and so +Apertis version 2 restricts it with a permission. The app framework should +be able to detect that an app-bundle was compiled for version 1, and behave +as though that app-bundle had requested the necessary permission to query +the vehicle model. It should not do that for an app-bundle compiled for +version 2. + +#### Security implications + +App-bundles that were compiled for version 1 would still be able to +carry out any attacks that were applicable before version 2 was released. +This use-case is only applicable if those attacks are considered to be +less serious than breaking backwards compatibility with older app-bundles. + +#### In other systems + +In [Android], a simple integer "API level" is used to indicate +the version of the Android API. Each app-bundle has a *minimum API level* +and a *target API level*. The app framework enables various compatibility +behaviours to make APIs resemble those that were present at the +target API level; one of these compatibility behaviours is to behave +as though app-bundles whose target API level is below a threshold +had requested extra permissions. For example, Android behaves as though +app-bundles with a target API level below 4 had requested +`android.READ_PHONE_STATE`. + +In [Flatpak], app-bundles can specify a minimum Flatpak version. There is +is currently no mechanism to specify a target API level, although one +could be inferred from the runtime branch that the app-bundle has +chosen to use, such as `org.freedesktop.Platform/1.4` or +`org.gnome.Platform/3.22`. + +In [iOS], keys like [NSAppleMusicUsageDescription] are +documented as behaving like permissions, but only if the app was linked +on or after iOS 10.0. + +### Loosening control + +Suppose Apertis version 1 restricts querying the vehicle paint colour with a +permission, but the Apertis developers later decide that this does not need +to be restricted, and Apertis version 2 allows all app-bundles to do that. +The app framework should never prompt the user for that permission. +If an app-bundle designed for version 1 checks whether it has that +permission, the app framework should tell it that it does. + +#### Security implications + +This use-case is only applicable if the Apertis developers have decided +that the security implications of the permission in question (in this +example, querying the paint colour) are not significant. + +#### In other systems + +We are not aware of any permissions that have been relaxed like this +in Android, Flatpak or iOS, but it would be straightforward for any of these +frameworks to do so: they would merely have to stop presenting a user +interface for that permission, and make requests for it always succeed. + +### Changing access + +An Apertis user uses a Facebook app-bundle. The user wants their location at +various times to appear on their Facebook feed, so they give the app-bundle +permission to monitor his location, as in [Location] above. + +Later, that user becomes more concerned about their privacy. They want to +continue to use the Facebook app-bundle, but prevent it from accessing their +new locations. They use a user interface provided by the system vendor, +perhaps a [system preferences application], to reconfigure the permissions +granted to the Facebook app-bundle so that it cannot access their location. + +Later still, that user wants to publish their location to their Facebook feed +while on a road trip. They reconfigure the permissions granted to the +Facebook app-bundle again, so that it can access their location again. + +#### Security implications + +This use-case is applicable if the user's perception of the most appropriate +trade-off between privacy and functionality changes over time. + +#### In other systems + +Android 6.0 and later versions have a +[user interface][Android app permissions] to revoke and reinstate broad +categories of permissions. Older [Android] versions had a hidden control +panel named [App ops][Android AppOps] controlling the same things at a +finer-grained level (individual permissions), but it was not officially +supported. + +[iOS] allows permissions to be revoked or reinstated at any time via +the [Privacy page in its Settings app][iOS Privacy settings], which is the +equivalent of the Apertis [system preferences application]. + +## Potential future use-cases + +Use cases described in this section are not intended to generate requirements +in the near future, and are not described in detail here. We recommend that +these use cases are expanded into something more detailed as part of design +work on the relevant feature: for example, Bluetooth permissions should be +considered as part of a more general Bluetooth feature design task. + +However, as input to the design of the general feature of permissions, +it might be instructive to consider whether a proposed implementation could +satisfy the requirements that these use-cases are conjectured to have. + +Because these use-cases have not been examined in detail, it is possible +that future work on them will result in the conclusion that they should +be outside the scope of the permissions framework described in this document. + +### Audio playback + +A music player requires the ability to play back audio while in the background. +A video player might require the ability to play ongoing audio, but only while +its window is in the foreground. An e-book reader might only require the +ability to play short notification sounds while in the foreground, or +might not require any ability to play sounds at all. +A voice-over-IP calling client requires the ability to play audio with +an elevated priority while a call is in progress, pre-empting other audio +players. + +We recommend that these and related use cases are captured in detail as part +of the design of the Apertis audio manager. + +#### Security implications + +Uncontrolled audio playback seems likely to cause +driver distraction. Additionally, if all applications can play back audio +with a priority of their choice, a malicious app-bundle could output silence +at a high priority as a denial of service attack (a failure of availability). + +#### In other systems + +In [Android] and [iOS], audio playback does not require special permissions. + +In [Flatpak], audio playback currently requires making the PulseAudio +socket available to the sandboxed app, which also enables audio +recording and control. Finer-grained control over audio is planned +for the future. + +### Audio recording + +A memo recorder requires the ability to record audio. A voice-over-IP +calling client also requires the ability to record audio. Most applications, +including most of those that play back audio, do not. + +We recommend that these and related use cases are captured in detail as part +of the design of the Apertis audio manager. + +#### Security implications + +An app-bundle that can record audio could record +private conversations in the vehicle (a failure of confidentiality). + +#### In other systems + +In [Android], audio recording requires the `RECORD_AUDIO` permission. + +In [Flatpak], audio recording currently requires making the PulseAudio +socket available to the sandboxed app, which also enables audio +playback and control. + +In [iOS], audio recording is mediated by `NSMicrophoneUsageDescription`. + +### Bluetooth configuration + +A [system preferences application], or a separate Bluetooth control panel +built-in app-bundle, might require the ability to reconfigure Bluetooth +in detail and communicate with arbitrary devices. + +A less privileged app-bundle, for example one provided by the manufacturer +of peripheral devices like FitBit, might require the ability to pair and +communicate with those specific Bluetooth devices. + +A podcast player has no need to communicate with Bluetooth devices at all. + +#### Security implications + +For the control panel use-case, communicating with arbitrary devices might be +an integrity failure if the app-bundle can reconfigure the device or edit +data stored on it, or a confidentiality failure if the app-bundle can +read sensitive data such as a phone's address book. The ability for untrusted +app-bundles to view MAC addresses and other unique identifiers would also +be a privacy problem. + +The device-specific use case is a weaker form of the above, mitigating +the confidentiality and integrity impact. + +#### In other systems + +In [Android], the `BLUETOOTH` permission allows an app-bundle to communicate +with any Bluetooth device that is already paired. This is stronger +than is needed for a device-specific app-bundle. The `BLUETOOTH_ADMIN` +permission additionally allows the app-bundle to pair new Bluetooth devices. + +In [Flatpak], full access could be achieved by configuring Flatpak's +D-Bus filter to allow `talk` access to BlueZ. There is currently no +implementation of partial access; this would likely require a Bluetooth +[portal][portals] service. + +In [iOS], the [NSBluetoothPeripheralUsageDescription] metadata field +controls access to Bluetooth, which appears to be all-or-nothing. User +consent is requested the first time this permission is used, with the +metadata field's content included in the prompt. + +### Calendar + +A general-purpose calendar/agenda user interface similar to [GNOME Calendar] +or the [AOSP Calendar] requires full read/write access to the user's calendar. + +A calendar synchronization implementation, for example to synchronize with +calendar events stored in Google Calendar, Windows Live or OwnCloud, requires +full read/write access to its subset of the user's calendar. For example, a +Google Calendar synchronization app-bundle should have access to Google +calendars, but not to Windows Live calendars. + +A non-calendaring application like an airline booking app-bundle might +wish to insert events into the calendar without further user interaction, +or it might wish to insert events into the calendar in a way that presents +them for user approval, for example by submitting a vCalendar file for +[content handover]. + +A podcast player has no need to interact with the calendar at all. + +#### Security implications + +The general-purpose user interface described above would have the ability to +send calendar events to a third party (a confidentiality failure) or to +edit or delete them (an integrity failure). + +The calendar synchronization example is a weaker form of the user interface +use-case: if malicious, it could cause the same confidentiality or integrity +failures, but only for a subset of the user's data. + +If the airline booking app-bundle described above has the ability to insert +calendar events without user interaction, a malicious app-bundle could +insert misleading events, an integrity failure; however, it would not +necessarily be able to break confidentiality. + +If the airline booking app-bundle operates via content handover, [intents], +[portals] or a similar mechanism that will result in user interaction, +a malicious app-bundle cannot insert misleading events without user action, +avoiding that integrity failure (at the cost of a more prescriptive UX). + +#### In other systems + +In [Android], +[the `READ_CALENDAR` and `WRITE_CALENDAR` permissions][Android calendar permissions] +are suitable for the general-purpose calendar use case. +[Sync adapters][Android calendar sync adapters] +receive different access; it is not clear from the Android documentation +whether their restriction to a specific subset of the calendar is enforced, +or whether sync adapters are trusted and assumed to not attack one another. +Applications that do not have these permissions, such as the hotel booking +use-case above, can use [calendar intents] to send or receive calendar +events, with access mediated through a general-purpose calendar user +interface that is trusted to behave according to the user's intention. +There is no way to prevent an app-bundle from using those intents at all. + +In [Flatpak], a general-purpose calendar might be given `talk` access to +the `evolution-data-server` service. There is currently no calendar +[portal][portals], but when one is added it will presumably be analogous +to Android intents. + +In [iOS], the [NSCalendarsUsageDescription] metadata field +controls access to calendars. User consent is requested the first time +this permission is used, with the metadata field's content included +in the prompt. + +### Contacts + +The use cases and security implications for contacts are analogous to those +for the [calendar] and are not discussed in detail here. + +#### In other systems + +[Android contact management] +is analogous to calendaring, using the `READ_CONTACTS` and `WRITE_CONTACTS` +permissions or contact-specific intents. + +In [Flatpak], as with contacts, a general-purpose contacts app-bundle might +be given `talk` access to the `evolution-data-server` service. There is +currently no contacts [portal][portals], but when one is added it will +presumably be analogous to Android intents. + +[iOS contact management][NSContactsUsageDescription] is analogous to +iOS calendaring. + +### Inter-app communication interfaces + +Inter-app communication has not been designed in detail, but the +draft design on the Apertis wiki suggests that it might be modelled +in terms of [interface discovery], with app-bundles able to implement +"public interfaces" that are made visible to other app-bundles. +The draft design has some discussion of how [restricting interface providers] +might be carried out by app-store curators. + +Additionally, if app-bundles export public interfaces, this might influence +whether other applications are allowed to communicate with them: if a +particular public interface implies that other app-bundles will communicate +directly with the implementor, then the implementor's AppArmor profile +and other security policies must allow that. A [sharing] feature similar to +the one in Android is one possible use-case for this. + +We recommend that this topic is considered as one or more separate +concept designs, with its security implications considered at the same time. +This is likely to be more successful if a small number of specific use-cases +are considered, rather than attempting to define a completely abstract and +general framework. + +In [Android], any app-bundle can define its own intents. If it does, those +intents can be invoked by any other app-bundle that holds appropriate +permissions, and it is up to the implementor to ensure that that is a +safe thing to do. + +In [Flatpak], app-bundles that will communicate via D-Bus can be given `talk` +access to each other. If this is done, it is up to the app-bundles to +ensure that they do not carry out unintended actions in response to D-Bus +method calls. + +In [iOS], any app-bundle can define non-standard URI schemes that it will +handle, and these non-standard URI schemes are the basis for inter-app +communication. There is no particular correlation between the +URI scheme and the app's identity (the iOS equivalent of our bundle IDs), +and there have been successful attacks against this, including the +[URL masque attack] identified by FireEye. + +### Continuing to run in the background + +[Agents] do not show any graphical windows, so to be useful they must +always run in the background. + +Graphical programs that have windows open, but no windows visible to the user, +might be terminated by the application framework. The author of a graphical +program that needs to be available without delay might wish to request that +it is not terminated. + +#### Security implications + +Background programs consume resources, impacting +availability (denial of service). A background program that has other +permissions might make use of them without the user's knowledge: for example, +if a restaurant guide can track the user's location, this can be mitigated +by only allowing it to run, or only allowing it to make use of its +permissions, while it is (or was recently) visible, so that +the user can only be tracked by the guide's author at times when they are +aware that this is a possibility. + +Users might wish to be aware of which graphical programs have this property, +and user interfaces for managing permissions might display it in the same +context as other permissions, but it is not a permission in the sense that +it is used to generate security policies. Accordingly, it should potentially +be handled outside the scope of this document. + +Future work on this topic is tracked in Apertis task +[T3438](https://phabricator.apertis.org/T3438) and its future subtasks. + +#### In other systems + +Android does not have permissions that influence its behaviour for +background programs. + +Flatpak does not currently attempt to monitor background programs or force +them to exit. + +iOS manages background programs via the [UIBackgroundModes] and +[UIApplicationExitsOnSuspend] metadata fields. +[NSSupportsAutomaticTermination] is analogous, but is for desktop macOS. + +### Running on device startup + +An [agent][Agents] might be run on device startup. + +A typical graphical program has no need to start running on device startup. + +A graphical program that is expected to be frequently but intermittently +used might be pre-loaded (but left hidden) on device startup. + +The security implications are essentially the same as +[continuing to run in the background]. + +Users might wish to be aware of which graphical programs have this property, +and user interfaces for managing permissions might display it in the same +context as other permissions, but it is not a permission in the sense that +it is used to generate security policies. Accordingly, it is treated as +outside the scope of this document. + +We suggest that this should be handled alongside +[continuing to run in the background]. + +#### In other systems + +In [Android], a graphical program or service that runs in the background +would have the `RECEIVE_BOOT_COMPLETED` permission, which is specifically +described as covering performance and not security. + +[Flatpak] does not natively provide this functionality. + +iOS manages autostarted background programs via certain values of the +[UIBackgroundModes] metadata field. + +## Non-use-cases + +The following use cases are specifically excluded from the scope of this +document. + +### App's own data + +Each app-bundle should be allowed to read and write its own data, including +its own [app settings]. However, this should not need any special permissions, +because it should be granted to every app-bundle automatically: accordingly, +it is outside the scope of this document. App settings are part of the scope +of the [Preferences and Persistence] concept design, and other per-app private +data are in the scope of the [Applications] concept design. + +Similarly, programs from each app-bundle should be allowed to communicate +with other programs from the same app-bundle (using any suitable +mechanism, including D-Bus) without any special permissions, with the +typical use-case being a user interface communicating with an associated +[agent][agents]. Because it does not require special permissions, that +is outside the scope of this document. + +### Platform services + +This permissions framework is not intended for use by platform services, +regardless of whether they are upstream projects (such as systemd, +dbus-daemon and Tracker), developed specifically for Apertis (such as +the Canterbury app manager, the Newport download manager and the Ribchester +volume mounting service), or developed for a particular vendor (such +as the compositor that implements a vendor-specific user interface, for +which the [Mildenhall Compositor] is a reference implementation). +Platform services should continue to contain their own AppArmor profiles, +polkit rules and other security metadata. + +### Built-in app-bundles with specialized requirements + +This permissions framework is not intended for use by built-in application +bundles with specialized or highly-privileged requirements, such as a +built-in application that communicates directly with specialized hardware. +These built-in application bundles should have their own AppArmor profiles, +polkit rules and other security metadata. + +### Driving cameras + +Some vehicles have external cameras for purposes such as facilitating +reversing, watching for hazards in the vehicle's blind spots, or improving +night vision by using thermal imaging. + +Our understanding is that images from these cameras should only be made +available to platform components or to specialized built-in app-bundles, +so they are outside the scope of this document. + +### Infotainment cameras + +[Android] and [iOS] mobile phones and tablets typically have one or more +cameras directed at the user or their surroundings, intended for photography, +videoconferencing, augmented reality and entertainment. Our understanding is +that this is not a normal use-case for an automotive operating system that +should minimize driver distraction. + +If a vehicle does have such cameras, their use cases and security +implications are very similar to audio recording, so we believe there +is no need to describe them in detail in this document. + +### App-specific permissions + +In [Android], any app-bundle can declare its own unique permissions namespaced +by its author's reversed domain name, and any other app-bundle can request +those permissions. It is not clear how an app-store vendor can be expected +to make an informed decision about whether those requests are legitimate. + +If an app-bundle signed by the same author requests one of these permissions, +it is automatically granted; Android documentation recommends this route. + +If an app-bundle by a different author that requests one of these +app-specific permissions is installed, a description provided by the +app-bundle that declared the permission is shown to the user when they are +choosing whether to allow the requesting app-bundle to be installed. If +the requesting app-bundle is installed before the declaring app-bundle, +then its request to use that permission is silently denied. + +[Flatpak] does not directly have this functionality, although cooperating +app-bundles can be given `talk` access to each other's D-Bus well-known +names. + +[iOS] does not appear to have this functionality. + +We recommend that this feature is not considered in the short term. + +## General notes on other systems + +Specific permissions corresponding to those for which we see a need in +Apertis are covered in the individual use cases above. This section describes +other operating systems and app frameworks in more general terms. + +### Android + +Android includes permissions in its XML manifest file. + +* [Introduction](https://developer.android.com/guide/topics/manifest/manifest-intro.html) +* [Permission API reference](https://developer.android.com/reference/android/Manifest.permission.html) +* [Permission group API reference](https://developer.android.com/reference/android/Manifest.permission.html) +* [Declaring that a permission is needed](https://developer.android.com/guide/topics/manifest/uses-permission-element.html) + +Android apps can declare new permissions in the XML manifest. + +* [Permission element](https://developer.android.com/guide/topics/manifest/permission-element.html) +* [Permission group element](https://developer.android.com/guide/topics/manifest/permission-group-element.html) +* [Permission tree element](https://developer.android.com/guide/topics/manifest/permission-tree-element.html) + +Since Android 6.0, it is possible to request additional permissions +(not declared in the manifest) at runtime. + +#### Permissions not described in this document + +The following access permissions, available as of API level 25, do not match +any use-case described in this document. Deprecated and unsupported +permissions have been ignored when compiling this document. + +Normal permissions: + +* `ACCESS_LOCATION_EXTRA_COMMANDS` +* `ACCESS_NETWORK_STATE` +* `ACCESS_NOTIFICATION_POLICY` +* `ACCESS_WIFI_STATE` +* `ADD_VOICEMAIL` +* `BATTERY_STATS` +* `BODY_SENSORS` +* `BROADCAST_STICKY` +* `CAMERA` +* `CHANGE_NETWORK_STATE` +* `CHANGE_WIFI_MULTICAST_STATE` +* `CHANGE_WIFI_STATE` +* `DISABLE_KEYGUARD` +* `EXPAND_STATUS_BAR` +* `GET_ACCOUNTS` +* `GET_ACCOUNTS_PRIVILEGED` +* `GET_PACKAGE_SIZE` +* `INSTALL_SHORTCUT` +* `KILL_BACKGROUND_PROCESSES` +* `NFC` +* `PROCESS_OUTGOING_CALLS` +* `READ_CALL_LOG` +* `READ_EXTERNAL_STORAGE` +* `READ_PHONE_STATE` +* `READ_SMS` +* `READ_SYNC_SETTINGS` +* `READ_SYNC_STATS` +* `RECEIVE_MMS` +* `RECEIVE_SMS` +* `RECEIVE_WAP_PUSH` +* `REORDER_TASKS` +* `REQUEST_IGNORE_BATTERY_OPTIMIZATIONS` +* `REQUEST_INSTALL_PACKAGES` +* `SEND_SMS` +* `SET_ALARM` +* `SET_TIME_ZONE` +* `SET_WALLPAPER` +* `SET_WALLPAPER_HINTS` +* `TRANSMIT_IR` +* `USE_FINGERPRINT` +* `USE_SIP` +* `VIBRATE` +* `WAKE_LOCK` +* `WRITE_CALL_LOG` +* `WRITE_EXTERNAL_STORAGE` +* `WRITE_SETTINGS` +* `WRITE_SYNC_SETTINGS` + +Permissions described as not for use by third-party applications: + +* `ACCOUNT_MANAGER` +* Several permissions starting with `BIND_` that represent the ability to + bind to the identity of a platform service, analogous to the ability to + own platform services' D-Bus names in Apertis +* `BLUETOOTH_PRIVILEGED` +* Several permissions starting with `BROADCAST_` that represent the ability + to broadcast messages, analogous to the ability to own a platform service's + D-Bus name and send signals in Apertis +* `CALL_PRIVILEGED` +* `CAPTURE_AUDIO_OUTPUT` +* `CAPTURE_SECURE_VIDEO_OUTPUT` +* `CAPTURE_VIDEO_OUTPUT` +* `CHANGE_COMPONENT_ENABLED_STATE` +* `CLEAR_APP_CACHE` +* `CONTROL_LOCATION_UPDATES` +* `DELETE_CACHE_FILES` +* `DELETE_PACKAGES` +* `DIAGNOSTIC` +* `DUMP` +* `FACTORY_TEST` +* `GLOBAL_SEARCH`, held by the global search framework to give it permission + to contact every global search provider +* `INSTALL_LOCATION_PROVIDER` +* `INSTALL_PACKAGES` +* `LOCATION_HARDWARE` +* `MANAGE_DOCUMENTS` +* `MASTER_CLEAR` +* `MEDIA_CONTENT_CONTROL` +* `MODIFY_PHONE_STATE` +* `MOUNT_FORMAT_FILESYSTEMS` +* `MOUNT_UNMOUNT_FILESYSTEMS` +* `PACKAGE_USAGE_STATS` +* `READ_FRAME_BUFFER` +* `READ_LOGS` +* `READ_VOICEMAIL` +* `REBOOT` +* `SEND_RESPOND_VIA_MESSAGE` +* `SET_ALWAYS_FINISH` +* `SET_ANIMATION_SCALE` +* `SET_DEBUG_APP` +* `SET_PROCESS_LIMIT` +* `SET_TIME` +* `SIGNAL_PERSISTENT_PROCESSES` +* `STATUS_BAR` +* `SYSTEM_ALERT_WINDOW` +* `UPDATE_DEVICE_STATS` +* `WRITE_APN_SETTINGS` +* `WRITE_GSERVICES` +* `WRITE_SECURE_SETTINGS` +* `WRITE_VOICEMAIL` + +#### Intents + +Holding a permission is not required to use an *intent* that implicitly asks +the user for permission, such as taking a photo by sending a request to the +system camera application, which will pop up a viewfinder provided by the +system camera application, allowing the user to either take a photo when they +are ready, or cancel by pressing the Back button; if the user takes a photo, +it is sent back to the requesting application as the result of the intent. +This is conceptually similar to Flatpak [portals]. + +### Flatpak + +Flatpak does not have a single flat list of permissions. Instead, its +permissions are categorized according to the resource being controlled. +Available permissions include: + +* Hardware-accelerated graphics rendering via [Direct Rendering Manager] + devices +* Hardware-accelerated virtualization via [Kernel-based Virtual Machine] + devices +* Full access to the host's device nodes +* Sharing specific filesystem areas on a read-only or read/write basis +* Sharing the host's X11 socket (not used in production on Apertis) +* Sharing the host's Wayland socket (always available to graphical programs + on Apertis) +* Full access to the host's D-Bus session bus +* Full access to the host's D-Bus system bus +* Full access to the host's PulseAudio socket +* Sharing the host system's network namespace (Internet and LAN access) +* Sharing the host system's IPC namespace (this does not control D-Bus + or `AF_UNIX` sockets, but would allow the app-bundle to be treated as + unconfined for the purposes of services that use + [Unix System V IPC][svipc(7)] or [POSIX message queues][mq_overview(7)] +* Sending and receiving messages to communicate with a specific D-Bus + well-known name (`talk` access) +* Permission to own (provide) specific D-Bus well-known names (`own` access) + +#### Portals + +[Flatpak portals] are similar to Android [intents]. These components +expose a subset of desktop functionality as D-Bus services that +can be used by contained applications: they are part of the security +boundary between a contained app and the rest of the desktop session. +The aim is for portals to get the user's permission to carry out actions, +while keeping it as implicit as possible, avoiding an "are you sure?" step +where feasible. For example, if an application asks to open a file, +the user's permission is implicitly given by them selecting the file +in the file-chooser dialog and pressing OK: if they do not want this +application to open a file at all, they can deny permission by cancelling. +Similarly, if an application asks to stream webcam data, the expected +UX is for GNOME's Cheese app or a similar non-GNOME app to appear, +open the webcam to provide a preview window so they can see what they +are about to send, but not actually start sending the stream to the +requesting app until the user has pressed a "Start" button. When defining +the API "contracts" to be provided by applications in that situation, +portal designers need to be clear about whether the provider is expected to +obtain confirmation like this: in most cases we anticipate that it will be +expected to do this. + +If this sort of implicit permission is not feasible for a particular portal, +it is possible for the portal implementation to fall back to a model +similar to [iOS], by asking the user for explicit consent to access +particular data. Flatpak provides a portal-facing API (the *permissions +store*) with which a portal can check whether the user already gave +permission for particular operations, or store the fact that the user +has now given permission. Each portal can define its own permissions, +but app-bundles cannot normally do so. + +There is currently no user interface for the user to review previously-granted +permissions and revoke them if desired, but one could be added in future, +again similar to [iOS]. + +Unlike Android intents, different Flatpak portals use different mechanisms +to send the result of a request to the portal back to the requesting +app-bundle. For example, many portals send and receive small requests +and results over D-Bus, but the file chooser makes the selected file +available in a FUSE filesystem that is visible inside the Flatpak sandbox. +This avoids having to stream the whole file over D-Bus, which could be very +slow and inefficient, particularly the file is very large and the app will +carry out random access within it (such as seeking within a video). + +More information on Flatpak portals can be found in +the article [The flatpak security model, part 3]. + +### iOS + +The iOS 10 model for permissions is a hybrid of the [intents]/[portals] +approaches, and the approach of pre-declaring Android permissions. Apps that +need access to sensitive APIs (analogous to portals) must provide a +description of why that access is required. This gives the app-store curator +an opportunity to check that these permissions make sense, as with Android +permissions. However, unlike Android, user consent is requested at the time +the app tries to exercise that access, not during installation. The given +description is included in the prompt, and can be used to justify why +access is needed. + +There is also a user interface for the user to review previously-granted +permissions, and revoke them if desired. + +#### Permissions not described in this document + +The usage descriptions that are the closest equivalent of permissions in iOS +appear to be a subset of the [Cocoa `Info.plist` keys], where +`Info.plist` is the iOS equivalent of our [application bundle metadata]. +They exist in the same namespace as non-permission-related keys such as +human-readable copyright notices. + +Usage descriptions not corresponding to a use-case in this document include: + +* `NSCameraUsageDescription` +* `NSHealthShareUsageDescription` +* `NSHealthUpdateUsageDescription` +* `NSHomeKitUsageDescription` +* `NSMotionUsageDescription` (accelerometer) +* `NSRemindersUsageDescription` +* `NSSiriUsageDescription` +* `NSSpeechRecognitionUsageDescription` +* `NSVideoSubscriberAccountUsageDescription` + +<!-- Links to other concept designs --> + +[Agents]: applications.md#start +[Android manifest]: http://developer.android.com/reference/android/Manifest.permission.html +[Applications]: applications.md +[Application bundle metadata]: application-bundle-metadata.md +[App settings]: preferences-and-persistence.md#app-settings +[Application list privacy]: application-entry-points.md#security-and-privacy-considerations +[Integrity, confidentiality and availability]: security.md#integrity-confidentiality-and-availability +[Media management concept design]: media-management.md +[Preferences and Persistence]: preferences-and-persistence.md +[Security]: security.md +[Shared data]: application-layout.md#shared-data +[System settings]: preferences-and-persistence.md#system-settings +[System preferences application]: preferences-and-persistence.md#user-interface +[User settings]: preferences-and-persistence.md#user-settings + +<!-- Apertis links --> + +[Content handover]: https://wiki.apertis.org/Content_hand-over +[Egress filtering]: https://wiki.apertis.org/Egress_filtering +[Interface discovery]: https://wiki.apertis.org/Interface_discovery +[Mildenhall Compositor]: https://gitlab.apertis.org/hmi/mildenhall-compositor +[Mildenhall Launcher]: https://gitlab.apertis.org/hmi/mildenhall-launcher +[Mildenhall Settings]: https://gitlab.apertis.org/hmi/mildenhall-settings +[Restricting interface providers]: https://wiki.apertis.org/Interface_discovery#Restricting_who_can_advertise_a_given_interface_2 +[Sharing]: https://wiki.apertis.org/Sharing + +<!-- External links --> + +[Android AppOps]: https://www.theguardian.com/technology/2015/jun/09/google-privacy-apple-android-lockheimer-security-app-ops +[Andoid app permissions]: https://www.howtogeek.com/230683/how-to-manage-app-permissions-on-android-6.0/ +[Android calendar permissions]: https://developer.android.com/guide/topics/providers/calendar-provider.html#manifest +[Android calendar sync adapters]: https://developer.android.com/guide/topics/providers/calendar-provider.html#sync-adapter +[Android contact management]: https://developer.android.com/guide/topics/providers/contacts-provider.html +[AOSP Calendar]: https://fossdroid.com/a/standalone-calendar.html +[Calendar intents]: https://developer.android.com/guide/topics/providers/calendar-provider.html#intents +[Cocoa `Info.plist` keys]: https://developer.apple.com/library/content/documentation/General/Reference/InfoPlistKeyReference/Articles/CocoaKeys.html +[Direct Rendering Manager]: https://en.wikipedia.org/wiki/Direct_Rendering_Manager +[Flatpak Portals]: https://github.com/flatpak/flatpak/wiki/Portals +[GNOME Calendar]: https://wiki.gnome.org/Apps/Calendar +[iOS fingerprinting]: https://arxiv.org/abs/1605.08664 +[iOS Privacy settings]: https://www.howtogeek.com/177711/ios-has-app-permissions-too-and-theyre-arguably-better-than-androids/ +[Kernel-based Virtual Machine]: https://en.wikipedia.org/wiki/Kernel-based_Virtual_Machine +[mq_overview(7)]: https://manpages.debian.org/mq_overview(7) +[NSBluetoothPeripheralUsageDescription]: https://developer.apple.com/library/content/documentation/General/Reference/InfoPlistKeyReference/Articles/CocoaKeys.html#//apple_ref/doc/uid/TP40009251-SW20 +[NSSupportsAutomaticTermination]: https://developer.apple.com/library/content/documentation/General/Reference/InfoPlistKeyReference/Articles/CocoaKeys.html#//apple_ref/doc/uid/TP40009251-SW13 +[Permissions on-demand]: https://savvyapps.com/blog/how-to-create-better-user-permission-requests-in-ios-apps +[svipc(7)]: https://manpages.debian.org/svipc(7) +[The Flatpak security model, part 3]: https://blogs.gnome.org/alexl/2017/01/24/the-flatpak-security-model-part-3-the-long-game/ +[UIBackgroundFields]: https://developer.apple.com/library/content/documentation/General/Reference/InfoPlistKeyReference/Articles/iPhoneOSKeys.html#//apple_ref/doc/uid/TP40009252-SW22 +[UIApplicationExitsOnSuspend]: https://developer.apple.com/library/content/documentation/General/Reference/InfoPlistKeyReference/Articles/iPhoneOSKeys.html#//apple_ref/doc/uid/TP40009252-SW23 +[URL masque attack]: https://www.fireeye.com/blog/threat-research/2015/04/url_masques_on_apps.html +[XCodeGhost]: https://en.wikipedia.org/wiki/XcodeGhost + +<!-- vim:set sw=4 sts=4 et: --> diff --git a/content/designs/platform.md b/content/designs/platform.md new file mode 100644 index 0000000000000000000000000000000000000000..56c6d6e99baea3f4b9a6a6e4c0ca3fb9fd30af6c --- /dev/null +++ b/content/designs/platform.md @@ -0,0 +1 @@ +# Apertis Platform diff --git a/content/designs/preferences-and-persistence.md b/content/designs/preferences-and-persistence.md new file mode 100644 index 0000000000000000000000000000000000000000..a6af7534fed3779d31864075a302b16847b214ed --- /dev/null +++ b/content/designs/preferences-and-persistence.md @@ -0,0 +1,1862 @@ +--- +title: Preferences and persistence +short-description: Preferences and persistent data storage for applications and services + (implemented) +authors: + - name: Philip Withnall +--- + +# Preferences and persistence + +## Introduction + +This documents how system services and apps in Apertis may store +preferences and persistent data. It considers the security architecture +for storage and access to these data; separation of schemas, default +values and user-provided values; and guidelines for how to present +preferences in the UI. + +The Applications Design, and Global Search Design documents are relevant +reading. The [Applications Design](applications.md) and +the [Global Search Design](global-search.md) reference the need for storage of persistent +data for apps. See [][Overall architecture] for a design covering this. + +The [Robustness Design](robustness.md) document gives more detail on the requirements for +robustness of main storage in the face of power loss. + +## Terminology and concepts + +### System Settings + +A *system setting* is one which does not vary by user, and applies to +the entire system. For example, networking settings. This document +considers system settings which must be readable by multiple components +— settings which are solely for the use of a single system service are +out of scope, and may be stored in whichever way that service wishes +(typically as a configuration file in /etc). This is particularly +important for sensitive settings, for example the shadow user database +in /etc/shadow, which must not be readable by anything except the system +authentication service (PAM). + +### **User settings** + +A *user setting* is one which does vary by user, but not by app. User +settings apply to the whole of a user's session. For example, the +language or theme. + +### **App settings** + +An *app setting* is one which varies by user and also by app. Throughout +this document, the term ‘app’ is used to mean an app-bundle, including +the UI and any associated agent programs, analogous to an Android .apk, +with a single security domain shared between all executables in the +bundle. The precise terminology is currently under discussion, and this +document will be updated to reflect the result of that. + +App settings apply only to a specific app, and would not make sense +outside the context of that app. For example, whether to enable +shuffling tracks in the media player; whether to open hyperlinks in a +new tab by default in the web browser; or the details for accessing a +user's e-mail account. + +### Preferences + +'*Preferences*' is the general term for system, user and app settings. +The terms 'preference' and 'setting' will be used interchangeably +throughout this document. + +### User services + +A *user service* is as defined in the Multiuser Design document — a +service that runs on behalf of a particular user. Throughout this +document, this is additionally assumed to mean a *platform* user +service, which is not tied to a particular app-bundle. The alternative +is an *agent* user service, which this document considers part of an +app-bundle, with the same access to settings as the app-UI. + +### Persistent data + +Persistent data is app state which persists across multiple user +sessions. For example, documents which the user has written, or the +state of the user's pending downloads. + +One distinguishing factor between preferences and persistent data is +that vendors may override the default values for preferences (see +[][Vendor overrides]), but not for persistent data. For example, a vendor would +not want to override information about in-progress downloads; but they +might want to override the default background image filename for a user. + +The persistent data for an app may be the same as the data it shares +between user sessions, or may differ. The difference between persistent +data and data for sharing between apps is discussed in the Multiuser +Design document. + +Persistent data is stored on main storage, whereas shared data is +expected to be passed in memory — so while the sets of data are the +same, the mechanisms used to handle them are different. Persistent data +is always private to an app, and cannot be read by another app or user. + +Persistent data might cover all state in an application — such that +restoring its persistent data when starting the application is +sufficient to make it appear as if it had been suspended, rather than +exited. Or persistent data might cover some subset of this. The decision +is up to the application authors. + +### **Main storage** + +A flash disk, hard disk, or other persistent data storage medium which +can be used by the system. This term has been chosen rather than the +more common *persistent storage* to avoid confusion with persistent +data. + +### **GSettings** + +[GSettings] is an interface provided by GLib for accessing settings. +As an interface, it can be backed by different storage backends — the +most common is dconf, but a key file backend is available for storage in +simple key files. + +GSettings uses a concept of 'schemas', which define available settings, +their data types, and their default values. Each setting is strictly +typed and must have a default value. A schema has an ID, and is +'instantiated' at one or more schema paths. Typically, a schema will be +instantiated at a single path, but may be instantiated at multiple paths +to support storing the same settings for multiple objects. For example, +a schema for an e-mail account could require a server name, username and +protocol, and be instantiated at [multiple paths][GSettings-relocatable], one path for each +configured e-mail account. + +### **AppArmor** + +[AppArmor] is an access control framework used by Apertis to enforce +fine-grained permissions across the entire system, restricting which +files each process can open. + +## Requirements + +### **Access permissions** + +Access controls must be enforceable on preferences. Read and write +permissions must be available. It is assumed that if a component has +read permission for a preference, it may also be notified of any changes +to that preference's value. It is assumed that if a component has write +permission for a preference, it may also reset that preference. + +A suggested security policy for preferences implements a downwards flow +for **reads**: + + - **Apps** may read their own app settings, user settings for the + current user, and all system settings. + + - **User services** may read the user’s application settings, user + settings for the current user, and all system settings. + + - **System services** may read their own app settings, and all system + settings. + +**Writes** are generally only allowed at the same level: + + - **Apps** may write their own app settings. + + - **User services** may write user settings for the current user. + + - **System services** may write system settings for all users, user + settings for any user, and app settings for any app for any user. + +Note that apps must not be able to read or write each others' settings. +Similarly for user services and system services. + +Persistent data is always private to a (user, app) pair, though it can +be accessed by user services and system services. + +### Writability + +As well as the value of a preference, components must be able to find +out whether the preference is writable. A preference may be read-only if +the component doesn't have write permission for it ([][Access permissions]) or if +it is locked down by the vendor ([](#vendor-lockdown)). + +This does not apply to persistent data, which is always read–write by +the (user, app) pair which owns it. + +### Rollback + +As per section 4.1.5 of the Applications Design document, and section 6 +of the System Update and Rollback Design document, applications must +support rollback to a previously installed version, including restoring +the user’s settings for that application by reverting the stored +preferences to those from the earlier version. The storage backends for +the preferences and persistence APIs must support restoring stored +preferences from an earlier version — they should not support +context-sensitive conversion of newer preferences to older ones. + +Applications do not have to support running with preferences or +persistent data from a newer version than the application code. + +### System and app bundle upgrades + +As per the Applications Design and the System Update and Rollback +design, applications must also support upgrading preferences and +persistent data from previous application versions to the current +version. + +They do not need to support downgrading preferences or persistent data +by converting it from a newer version to an older one. + +### **Factory reset** + +The system must provide some means for the user to reset the state of +all apps to a factory default for a particular user, or for all users. +This is necessary for supporting removing user accounts, refreshing the +car for transfer to a new owner, or clearing the state of a temporary +guest account (see the Multiuser Design document). Similarly, it must +support clearing the state of a single (user, app) pair. + +The factory reset must support resetting preferences, persistent data, +or both. + +### **Abstraction level** + +The preferences and persistent data APIs may want to abstract the +underlying storage backend, for example to support uniform access to +preferences stored in multiple locations. If so, details of the +underlying storage backend must not be present in the abstraction (a +'leaky abstraction') — for example, SQL fragments must not be used in +the interface, as they tie the implementation to an SQL-based backend +and a specific schema. + +Conversely, any more than one layer of abstraction is an unnecessary +complication. + +### **Minimising I/O bandwidth** + +As with all components which use main storage, the preferences and +persistent data stores should minimise the I/O load they impose on main +storage. This is a particular concern at system startup, where typically +a lot of data must be loaded from main storage, and hence I/O read +efficiency is important. + +### Atomic updates + +The system must make atomic writes to main storage, so that preferences +or persistent data are not corrupted or lost if power is lost part-way +through saving changes. + +An atomic write is one where the stored state is either the old state, +or the new state, but never an intermediate between the two, and never +missing entirely. In other words, if power is lost while updating a +preference, upon rebooting either the old value of the preference must +be loadable, or the new value must be loadable. + +See the Robustness Design document, §3.1.1 for more details on general +robustness requirements. + +### Transactional updates + +The system must allow updates to preferences to be wrapped in +transactions, such that either all of the preferences within a +transaction are updated, or none of them are. Transactions must be +revertable before being applied permanently. + +### Performance tradeoffs + +Preferences are typically written infrequently and read frequently; +access patterns for persistent data depend on the app. The +implementation should play to those access patterns, for example by +using locking which favours readers over writers. + +### Data size tradeoffs + +It is not expected that preference values will be large — a few tens of +kilobytes at most. Conversely, persistent data may range in size from a +few bytes to many megabytes. The implementation should use a storage +format suitable to the expected data size. + +### Concurrency control + +As system preferences may affect security policy, reading them should be +race free, particularly from [time-of-check-to-time-of-use][TTCTTOU] race conditions. +For example, if a preference is changed by process C while process R is +reading it, process R must either see the new value of the preference, +or see the old value of the preference *and* subsequently be notified +that it has changed. + +Similarly for persistent data. + +### Vendor overrides + +It may be desirable to support *vendor overrides*, where a vendor +shipping Apertis can change the default values of the (app, user or +system) preferences before shipping to the end user. For example, they +may change the default background image shown to the user. + +If these are supported, resetting a preference to its default value (for +example, if doing a [][Factory reset]) must restore it to the +vendor-supplied default, rather than the Apertis default. There is no +need to be able to access the Apertis default at any time. + +This does not apply to persistent data. + +### Vendor lockdown + +It may also be desirable to support *vendor lockdowns*, where a vendor +shipping Apertis can lock a preference so that end users or +non-privileged applications may not change it. For example, they may +wish to lock the URI which is checked for system updates. + +This does not apply to persistent data. + +### User interface + +There must be some user interface (UI) for setting preferences. This may +be provided by a system preferences application, as a separate window in +each application, or as individual widgets embedded throughout an +application’s interface; or a combination of these options. + +This does not apply to persistent data. + +### Control over user interface + +It must be possible for the vendor to have complete control over the way +preferences are presented if all applications’ preferences are presented +in a system preferences application. + +This does not apply to persistent data. + +### Rearrangeable preferences + +It must be possible for a vendor to rearrange the preferences from +applications if they are presented in a system preferences application, +so that (for example) all ‘privacy’ preferences are presented in a page +together. + +### Searchable preferences + +It must be possible for a system preferences application provided by the +vendor to allow the user to search all preferences from all +applications. + +### Storage of user secrets and passwords + +There must be a secure way to store user secrets and passwords, which +preserves confidentiality of these data. This may be separate from the +main preferences or persistent data stores. + +### Preferences hard key + +There must be support for a preferences hard key (a physical button in +the vehicle) which when pressed causes the currently active +application’s settings to be displayed. If no applications are active, +it could display the system preferences. Some vehicles may not have such +a hard key, in which case the functionality should be ignored. + +## Existing preferences systems + +This chapter describes the conceptual model, user experience and design +elements used in various non-Apertis operating systems' support for +preferences and persistent data, because it might be useful input for +decision-making. Where available, it also provides some details of the +implementations of features that seem particularly interesting or +relevant. + +### GNOME Linux desktop + +#### Preferences + +On a modern GNOME desktop, from which Apertis uses a lot of components, +settings are stored in multiple places. + + - **System settings**: Stored in /etc by each system service, + typically in a text file with a service-specific format. A lot of + them have a system-wide default value, and may be overridden per + user (for example, each user can set their own timezone and locale, + with a system-wide default). + + - **User settings**: Defined by shared GSettings schemas (such as + org.gnome.system.locale), or schemas specific to individual user + services (such as org.freedesktop.Tracker). The values are stored in + dconf (see below). + + - **App settings**: Defined by app-specific GSettings schemas. The + values are stored in dconf (see below). + +[dconf] supports multiple layered databases, each stored separately. +For each settings key, a value set for it in one layer overrides any +values set in the layers below. The bottom (read-only) layer is always +the set of default values which are provided by the schema file. This +layered approach allows the system administrator to change settings +system-wide in a system database, but also allows users to override +those settings in their per-user database. It allows a user to reset all +their settings by deleting their per-user database — at which point, the +values from the next layer down (typically either a system database or +the defaults from schema files) will be used for all settings keys. + +[Lockdown][dconf-lockdown] is supported in dconf in the opposite direction: keys may +be locked down at a particular level, and may not be set at levels above +that one (but may be set at levels below it, as defaults). + +Architecturally, dconf allows direct read-only access to all databases — +each app reads settings values directly from the database. Writes to the +databases are arbitrated through a per-user dconf daemon which then +forces each app to refresh its read-only view of the settings. This +allows for fast concurrent reads of settings, at the cost of making +writes expensive. + +dconf does *not* support access controls, and does not support storing +different schemas in different databases at the same layer. Hence a user +either has write access to the whole of a system database, or write +access to none of it. As the dconf daemon runs per user, any app +accessing the daemon may write to any settings key, either its own app +settings, another app's settings, or the user's settings. + +#### Persistent data + +Persistent data is stored in application-defined formats, in +application-defined locations, although many follow the [XDG Base +Directory Specification][XDG-base-dir-spec], which puts cache data in XDG\_CACHE\_HOME +(typically ~/.cache) and non-cache data in XDG\_DATA\_HOME (typically +~/.local/share). Below these two directories, applications create their +own directories or files as they see fit. There is no security +separation between applications, but the normal UNIX permissions +restrict access to only the current user. + +There are no APIs available in GNOME for automatically persisting an +entire application’s state — if an application wishes to do this, it +must implement its own serialisation and deserialisation functions and +save to a file, as above. + +#### Secrets and passwords + +On a GNOME or KDE desktop, all user secrets, passwords and credentials +are stored using the [Secret Service] API. In GNOME, this API is +implemented by GNOME Keyring; in KDE, by KWallet. + +The [API][Secret Service API] allows storage of byte array ‘secrets’ (such as +passwords), along with non-secret attributes used to look them up, in an +encrypted storage file which must be unlocked by the user before it can +be accessed by applications. Unlocking it may be automatic if the user +does not set a password on the file (or if the password is identical to +the user’s login password). Secrets are stored in ‘collections’, which +may group them for different purposes, and which are encrypted +separately. + +An application must open a session with the secret service in order to +access secrets. The session may be used to encrypt secrets while they +are in transit between the service and application, and allows for +encryption algorithm negotiation for this purpose. + +For certain actions, the secret service may need to interact directly +with the user in order to establish a trusted path to the user, and +avoid (for example) requiring the user to enter their password into a +potentially untrusted application for that application to forward it to +the service. + +### Android + +#### Preferences + +Apps can use the [SharedPreferences class][Android-shared-pref] to read and write +preferences from named preferences files, with apps typically using a +single preferences file with a default name. These files are stored +per-app, and are private to that app by default, but may be shared with +other apps, either read-only or read–write. + +Preferences are strongly typed, and default values are provided by the +app at runtime. There is no concept of layering or of schemas — all +definition of the preferences files is handled at runtime. + +Preferences are saved to disk immediately. + +Android uses a [custom XML format][Android-prefs-def] to allow apps to define +preference UIs (known as ‘activities’ in Android terminology). This +format can define simple lists of preferences, through to complex UIs +with grouped preferences, subscreens, lists of subscreens, and custom +preference widgets. Implementing features such as making one preference +conditional on another is possible, but requires complex XML. + +A [PreferenceFragment][Android-pref-fragment] can be used to automatically build a screen +in an application to display preferences, loading them from the XML +file. It will load the current values of the preferences from the +SharedPreferences store, and will write new values back to the store as +the preferences are modified in the UI. + +In order for the system to display the preferences for a particular +application, it must execute one or more of the PreferencesFragment +classes from that application. + +#### Persistent data + +Android offers several options for [persistent data][Android-persistent-data]: + + - **Internal storage**: Files in a per-(user, app) directory, which + may optionally be made world-readable or writable to allow access to + other apps or users (though this is strongly discouraged). + + - **External storage**: Files in a world-readable storage area which + is accessible to the user, such as an SD card. Accessible to all + other apps and users which hold the READ\_EXTERNAL\_STORAGE or + WRITE\_EXTERNAL\_STORAGE permissions. + + - **SQLite database**: Arbitrary app-defined tables in a per-(user, + app) SQLite database. This cannot be shared with other apps or + users. + + - **Network connection**: Using the normal networking APIs, Android + suggests that data can be stored on servers controlled by the app + developers. It provides no special API for this. + +For saving an application’s state, Android offers a persistence API on +the [Activity class][Android-activity]. This automatically saves the state of all UI +elements (such as the text in an entry widget, and the position of a +list), but cannot automatically save application-specific internal state +(member variables). For this, the application must override two toolkit +methods (onSaveInstanceState() and onRestoreInstanceState()) and +implement its own serialisation and deserialisation of state to a set of +key–value pairs which are then stored by Android. + +#### Secrets and passwords + +Android recommends storing secrets and passwords in two ways. For +authentication credentials for online services, it provides an +AccountManager [API][Android-account-API] which abstracts authentication for known online +services (which are supported by pluggable backends, potentially +provided by application bundles) and stores the credentials in an +OS-wide store. The service handles authenticating and re-authenticating +when the login session ends. + +For secrets which are not for online accounts, or otherwise do not fit +the AccountManager pattern, Android [recommends][Android-settings-question] using the normal +preferences API ([][Preferences]), as while preferences are not encrypted +in storage, they are only accessible to the application which owns them, +so cannot be stolen by other applications. However, if the sandboxing +system is compromised (potentially by an attacker with physical access +to the device), the stored secrets will be accessible in plaintext. + +### iOS + +#### Preferences + +iOS stores preferences as [key–value pairs][iOS-key-values], which are separated +into domains by user, application and machine. The same preference may +be set in [multiple domains][iOS-pref-domains], and they are searched in a defined priority +order to determine which value to use. This means that an +application may, for example, choose to share a given preference between +all users of that application on a given machine. + +Application IDs use the standard reverse domain name syntax to ensure +uniqueness. + +Preference values may be any type supported by Core Foundation [property +lists][iOS-prop-lists], including strings, integers and arrays. Default values must +be coded into the application. + +Preference keys may be generated at runtime by the application, and do +not have to be defined in a schema in advance. However, it is typical to +use pre-defined property lists. + +Preferences are synchronised with the on-disk store manually, so the +application chooses when they are written to disk. + +On certain Apple operating systems, [preferences may be ‘managed’ by the +administrator][iOS-pref-practices], setting an override value which overrides any value +set by the user for a given preference key. + +Application preferences can either be presented as part of the +application, using normal UI widgets, and accessing the [NSUserDefaults +class][iOS-NSUserDefaults] for the preference values. Or they can be presented as part +of the [system-wide settings application][iOS-system-defaults], which builds the UI for +each application’s preferences dynamically from that application’s +property list file for preferences. An application may provide multiple +property list files to build a hierarchy of preferences pages. The +system-wide settings application accesses NSUserDefaults on behalf of +the application to update the stored preferences. + +#### Persistent data + +iOS offers several options for persistent data: + + - **Filesystem**: Arbitrary files may be written to the filesystem in + various [app-specific locations][iOS-app-dirs]. + + - **Core Data API**: This is an [object-graph management API][iOS-graph-management], + which allows versioned control of instances of objects created from + a schema. Instead of being used by an application to persist data, + this API is designed to form the core of the application’s data + model. It supports editing and discarding edits, undo, redo, + versioning of the object schema, and large data sets. + + - **Property List API**: A property list is a hierarchical, structured + piece of data, consisting of primitive data types, arrays and + dictionaries which may be nested [arbitrarily][iOS-prop-nesting]. Property lists + can therefore be used to store arbitrary application data. There is + an API to serialise them to the file system. + + - **SQLite**: The standard SQLite API may be used, backed by a file, + to store relational data in a database. + +For persisting an entire application’s state, iOS provides a [solution][iOS-full-state-persist] +similar to [Android][Persistent data]. The developer must annotate +each UI view class which needs to be saved and restored, and the UI +toolkit will automatically persist the state of the widgets in that view +when the application is suspended. As with Android, the developer must +implement two methods for serialising and deserialising +application-specific state from member variables: +encodeRestorableStateWithCoder and decodeRestorableStateWithCoder. + +#### Secrets and passwords + +iOS uses the same [keychain API][iOS-keychain] as OS X. This provides a system +service for storing secrets, passwords and certificates. They are +encrypted in storage, using an encryption key which is derived from the +iOS application’s ID and the user’s password. + +The keychain is encrypted in backups, and stored without its encryption +key, so an attacker cannot extract secrets from backups. + +An iOS application can access the secrets it has stored in the keychain, +but cannot access secrets from other applications. There is no way to +(for example) share login details for a given website between all +applications which access that website — they must all query the user +for the details and store them separately. This differs from OS X, where +all applications can access any stored secrets, subject to the user +approving the access (trusting the application). + +### GENIVI + +#### Preferences and persistent data + +GENIVI does not differentiate between preferences and persistent data, +and provides one low-level API for saving and loading persistent data. +It does not support automatically persisting an entire application’s +state. + +The GENIVI [Persistence Management system][GENIVI-persistence] handles all data read and +written during the lifetime of an IVI system. It aims to provide a +standard API for all GENIVI platforms to use, which reliably stores data +in the face of power disturbances, and the limited write-cycle lifetime +of some non-volatile storage devices (flash memory). + +It is split into four components: + + - Client library: API for writing key–value or arbitrary data to a + file, which may be used by only the current application, or shared + between all applications. + + - Administration service: system for installing default values and + configuration for the data storage for each application; backing up + and restoring stored data; and implementing factory reset of data. + + - Common object: used by the other components to access key–value + databases through a caching layer. + + - Health monitor: system under development to implement data recovery + in the case of corruption or loss, using existing backups. + +The GENIVI Persistence Management system only supports storage of data +as byte arrays — applications must serialise and deserialise their data +formats themselves. Similarly, it does not implement versioning of +stored data. + +The data storage code is implemented as a set of plugins for the client +library, implementing different methods for storing data. There are +various types of plugins implementing layers of functionality such as +hardware information querying, encryption, early loading of data, and +the default storage backend. + +Key–value data is limited to 16KB per key. Keys are stored +per-application, namespaced by an application-chosen arbitrary +identifier. As persistent data is stored in a separate file per +application, Unix users and groups may be used to enforce access control +on the persisted data. + +GENIVI has investigated providing an SQLite API for relational data +storage, and has provided [recommendations for it][GENIVI-relational-API], but has not +shipped a version with SQLite support (as of version 0.3.0 of this +document). + +To persist an application’s state, the developer must manually implement +serialisation and deserialisation of all UI and internal state of the +application using the Persistence client library. + +#### Secrets and passwords + +Similarly, GENIVI has no specialised API for storing secrets and +passwords — applications must use the persistence management system. The +system does allow for encrypted storage of persistent data using a +plugin — but that encrypts all stored data, including preferences and +application state. + +## Approach + +Preferences and persistent data have largely separate requirements: +preferences are small amounts of data; need to be accessed by multiple +components; will typically be read much more frequently than they are +written; and need to support features like [][Vendor overrides] and +[](#vendor-lockdown). Persistent data may vary from +small to large amounts of data; will be read *and* written frequently; +in app-specific formats; and do not need to be accessed by other +components. + +The expected amount of data to be stored, and the relative frequency of +reads and writes of that data, is an important factor in the choice of +storage format to use. Preferences should be stored in a format which is +optimised for reads; persistent data should be stored in a format which +is optimised for frequent reads and writes, since apps should update it +frequently as they may be killed at any time. + +For these reasons, we suggest preferences and persistent data are +handled entirely separately. The following sections (6 and 7) will cover +them separately, giving our recommended approach and justifications +which refer back to the requirements (section 3). + +User secrets and passwords ([][Storage of user secrets and passwords]) have different requirements +again: + + - Confidentiality in storage (encryption). + + - Sharing secrets and passwords for a given resource (such as website) + between all applications using that website (i.e. secrets and + passwords are not necessarily specific to an application, while + preferences typically are). + + - No fixed schema: the credentials required to access a given service + (such as website) may change over time as that service changes. + +As the system explicitly does not support full-disk encryption (for +performance reasons), user secrets and passwords should be stored via +the freedesktop.org [Secrets D-Bus API][D-Bus-secret-service], rather than the preferences +or persistence APIs. The Secrets D-Bus API explicitly handles encryption +of the secret store, whereas a general design for a preferences system +should have no need for encryption, and hence adding it to the API would +be an unnecessary complication for 90% of the use cases. Accordingly, +confidential data will not be considered in the approach below. + +For further discussion and designs on the topic of secrets and +passwords, see the [Security design document](security.md). + +## Preferences approach + +### Overall architecture + +Access to app, user and system settings should be through the GSettings +API, most likely backed by dconf. (Refer to [][GNOME Linux desktop] for an overview +of the way GSettings and dconf fit together.) As system settings are +defined as those settings which are accessed by multiple components, +settings which are solely for the use of a single system service may be +stored in other ways, and are beyond the scope of this document. + +Each component should have its own GSettings schema: + + - **App schemas**: In the form net.example.MyApplication.SchemaName. + Each app may have zero or more schemas, but all must be prefixed by + the app ID (in this case, net.example.MyApplication; see the + Applications Design document for details on the application ID + scheme) to provide a level of namespacing. + + - **User schemas**: These may have any form, and will typically re-use + existing cross-desktop schemas, such as org.gnome.system.locale, as + these are supported by many existing software components used by + Apertis. + + - **System schemas**: These may have any form, similarly. + +Schema files for apps should be packaged with their app. For user +services, they could be packaged with the most relevant service, or in a +general purpose gsettings-desktop-schemas package (adapted from Debian) +and an accompanying apertis-schemas package for Apertis-specific +schemas. + +All reads and writes of all settings should go through the normal +GSettings interface — leaving access controls and policy to be +implemented in the backend. App code therefore does not need to treat +reads and writes differently, or treat app, user and system settings +differently. + +The use of GSettings also means that a single schema may be instantiated +at multiple schema paths. Typically, a schema will only be instantiated +at the path matching its ID; but a *relocatable* schema may be +instantiated at other paths. This can be used to store settings for +multiple accounts, for example. + +It is expected that each app will handle any upgrades to its preference +schemas, for example from one major version of the app to the next +([][System and app bundle upgrades]). Apertis will not provide any special APIs for this. As +this is highly dependent on the structure of the preference keys an app +is storing, Apertis can provide no recommendations here. Note, however, +that GSettings is designed with upgradability in mind: new preference +keys take their value from the schema-provided defaults until the user +sets them; the values for old preferences which are no longer in the +schema are ignored. It is recommended that the type or semantics of a +given GSettings key is not changed between versions of an app bundle — +if it needs to be changed, stop using the old key, migrate its stored +value to a new key, and use the new key in newer versions of the app +bundle. + +#### Requirements + +Through the use of the GSettings API, the following requirements are +automatically fulfilled: + + - [][Writability] — using g\_settings\_is\_writable() + + - [][System and app bundle upgrades] — old keys are either kept, or superseded by new + keys with migrated values if their type or semantics change + + - [][Factory reset] — for individual keys, using + g\_settings\_reset(); support for resetting entire schemas needs to + be supported by the designs below + + - [][Abstraction level] — GSettings serves as the abstraction layer, + with the individual backends below adding no further abstractions + + - [][Transactional updates] — GSettings provides + g\_settings\_delay(), g\_settings\_apply() and g\_settings\_revert() + to implement in-memory transactions which are serialised in the + backend on calling apply + + - [][Concurrency control] — g\_settings\_get() automatically returns the + default value if no user-set value exists; there is no atomic API + for setting settings + + - [][User interface] — g\_settings\_bind() can be used to bind a + GSettings key to a particular UI widget, allowing interface UIs to + be built easily (noting the argument in [][User interface] that preferences + UIs should not be automatically generated) + +Other requirements are fulfilled separately: + + - [][Control over user interface] — by generating preferences + windows from GSettings schemas in the system preferences application + ([][Searchable preferences]) + + - [][Rearrangeable preferences] — by hard-coding more + behaviour in the system preferences application ([][User interface]) + + - [][Searchable preferences] — searching over summaries and + descriptions in GSettings schemas ([][Security policy]) + + - [][Storage of user secrets and passwords] — using the + freedesktop.org Secrets D-Bus API as in the Security design (section + 5) + + - [](#preferences-hard-key) — implemented according to the Hard Keys + design ([](#preferences-hard-key1)) + +### Proxied dconf backend + +In its current state (May 2015, detailed in [][GNOME Linux desktop]), dconf does not +support the necessary fine-grained access controls for multiple +components accessing the preferences. However, a design is being +implemented upstream to proxy access to dconf through a separate service +which imposes access controls based on AppArmor ([mostly implemented as +of January 2016][appservice]). + +On the assumption that this work can be completed and integrated into +Apertis on an appropriate timescale (see [][Summary of recommendations]), +this leads to a design where the dconf daemon runs as +a system service, storing all settings in one database file per default +layer: + + - **App database**: + /Applications/*net.example.MyApplication*/*username*/config/dconf/app + + - **User database**: ~/.config/dconf/user + + - **System database**: /etc/dconf/db/local + +This would be implemented as the dconf +profile: + +--- +user-db:user +file-db:/Applications/net.example.MyApplication/username/config/dconf/app +system-db:local +--- + +All accesses to dconf would go through GSettings, and then through the +proxy service which applies AppArmor rules to restrict access to +specific settings, implementing the chosen security policy ([][Access permissions]). +The rules may, for example, match against settings path and the +AppArmor label of the calling process. + +The proxy service would therefore implement a system preferences +service. + +[Vendor lockdown](#vendor-lockdown) is supported already by [dconf] +through the use of lockdown files, which specify particular keys or +settings sub-trees which may not be modified. + +[Rollback][Rollback] is supported by having one database file per +(user, app) pair, which can be snapshotted and rolled back using the +normal app snapshot mechanism described in the Applications Design. +dconf will detect the rollback of the database and reload it. + +Resetting all system settings would be a matter of deleting the +appropriate databases — the keys in that database will revert to the +default values provided by the schema files. As this is a simple +operation, it does not have to be implemented centrally by a preferences +service. Resetting the value of an individual key is supported by the +g\_settings\_reset() API, which is already implemented as part of +GSettings. + +The existing Apertis system puts + +--- +#include <abstractions/gsettings> +--- + +in several of the AppArmor profiles, which gives unrestricted access to +the user dconf database. This must change with the new system, only +allowing the dconf daemon access to the database. + +#### Requirements + +This design fulfills the following requirements: + + - [][Access permissions] — through use of the proxy service and + AppArmor rules + + - [][Rollback] — by rolling back the user’s per-app database + + - [][Factory reset] — by deleting the user’s database or the user’s + per-app database + + - [][Minimising io bandwidth] — dconf’s database design is optimised + for this + + - [][Atomic updates] — dconf performs atomic overwrites of the + database + + - [][Performance tradeoffs] — dconf is heavily optimised for reads + rather than writes + + - [][Data size tradeoffs] — dconf uses GVDB for storage, so can + handle small to large amounts of data + + - [][Vendor overrides] — dconf supports vendor overrides inherently + + - [](#vendor-lockdown) — dconf supports vendor lockdown inherently + +### Development backend + +In the interim, we recommend that the standard dconf backend be used to +store all system, user and app settings. This will *not* allow for +access controls to be applied to the settings ([][Access permissions]), but +will allow for app development against the final GSettings interface. + +Once the proxied dconf backend is ready, it can be packaged and the +system configuration changed — no changes should be necessary in user +services or apps to make use of the changed backend. + +This development backend would support vendor lockdown as normal. It +would support resetting all settings at once, but would not support +resetting an individual app’s settings (or rolling them back) +independently of other apps, as all settings are stored in the same +dconf database file. + +#### Requirements + +This design fails the following requirements: + + - [][Access permissions] — **unsupported** by the current version of + dconf + + - [][Rollback] — **unsupported** by the current version of dconf + +It supports the following requirements: + + - [][Factory reset] — **partially supported** by deleting the user’s + database; resetting a (user, app) pair is not supported as all + settings are stored in the same dconf database file + + - [][Minimising io bandwidth] — dconf’s database design is optimised + for this + + - [][Atomic updates] — dconf performs atomic overwrites of the + database + + - [][Performance tradeoffs] — dconf is heavily optimised for reads + rather than writes + + - [][Data size tradeoffs] — dconf uses GVDB for storage, so can + handle small to large amounts of data + + - [][Vendor overrides] — dconf supports vendor overrides inherently + + - [](#vendor-lockdown) — dconf supports vendor lockdown inherently + +### Key-file backend + +As an alternative, if it is felt that the development backend is too +simplistic to use in the interim before the proxied dconf backend is +ready, the GSettings key-file backend could be used. This would allow +enforcement of access controls via AppArmor, at the cost of: + + - lower read performance due to not being optimised for reads (or in + general); + + - requiring code changes in user services and apps to switch from the + key-file backend to the proxied dconf backend once it's ready; + + - requiring settings values to be migrated from the key-file store to + dconf at the time of switch over; + + - not supporting vendor lockdown or vendor overrides. + +Due to the need for code changes to switch away from this backend to a +more suitable long-term solution such as the proxied dconf backend, we +do not recommend this approach. + +In detail, the approach would be to use a separate key file for each +schema instance, across all system services, user services and apps. +This would require using g\_settings\_key\_file\_backend\_new() and +g\_settings\_new\_with\_backend\_and\_path() to manually construct the +GSettings instance for each schema, using a key file path which +corresponds to the schema path. + +Access control for each schema instance would be enforced using AppArmor +rules which restrict access to each key file as appropriate. For +example, apps would be given read-only access to the key files for +system and user settings, and read–write access to the key file for +their own app settings. + +Vendor lockdown would be supported by vendors patching the AppArmor +files to limit write access to specific schema instances. It would not +support per-key lockdown at the granularity supported by dconf. + +This code for creating the GSettings object could be abstracted away by +a helper library, but the API for that library would have to be stable +and supported indefinitely, even after changing the backend. + +#### Requirements + +This design fails the following requirements: + + - [][Performance tradeoffs] — GKeyFile is **equally non-optimised** + for reads and writes + + - [][Vendor overrides] — **unsupported** by GKeyFile + + - [](#vendor-lockdown) — **unsupported** by GKeyFile + +It supports the following requirements: + + - [][Access permissions] — supported by AppArmor rules on the + per-schema key files + + - [][Rollback] — by snapshotting and rolling back the appropriate key + files + + - [][Factory reset] — by deleting the appropriate key files + + - [][Minimising io bandwidth] — GKeyFile’s I/O bandwidth is + proportional to the number of times each key file is loaded and + saved + + - [][Atomic updates] — GKeyFile performs atomic overwrites of the + database + + - [][Data size tradeoffs] — GKeyFile’s load and save performance is + proportional to the amount of data stored in the file, so it is + suitable for small amounts of data + +### Security policy + +All three potential backends enforce security policy through per-app +AppArmor rules (if they support implementing security policy at all — +the [][Development backend], does not). + +It is beyond the scope of this document to define how each app ships its +AppArmor rules, and how Apertis can guarantee that third-party apps +cannot grant themselves higher privileges using additional rules. The +suggestion in section 8.3 of the Applications Design document is for the +AppArmor rule set for an app to be automatically generated from the +app’s manifest file by the app store (which is trusted). The manifest +file could contain permissions such as ‘can-change-locale’ or +‘can-add-network’ which would translate to AppArmor rules allowing an +app write access to the relevant user and system settings. + +Additionally, by generating AppArmor rules from an app’s manifest, the +precise format of the AppArmor rules is abstracted, allowing the +preferences backend to be switched in future (just as app access to +preferences is abstracted through GSettings). + +### User interface + +Different options for building preferences user interfaces need to be +supported by the system ([][Control over user interface]): + + - Individual preferences embedded at different points in the + application UI. + + - A preferences window implemented within the application. + + - A system preferences application which controls displaying the + preferences for all installed applications, plus system preferences. + +In all cases, we recommend that preferences are defined using GSettings +schemas, as discussed in [][Overall architecture], and that settings are read and +written through the [GSettings] API. This ensures that access +control is enforced, and separates the structure of the preferences +(including types and default values) from their presentation. + +The choice of how preferences are presented ultimately lies with the +vendor. In certain cases, an application may choose to display a +preference embedded into its UI (for example, as a +satellite/hybrid/standard view selector overlaid on a map view), if it +makes sense for that preference to be displayed in-context as opposed to +in a preferences window. This user experience is something which should +be checked as part of app validation. + +The majority of preferences should be displayed in a separate +preferences window. In order to allow this window to be embedded into a +system preferences application if the vendor desires it, the preferences +window must be automatically generated. This is because: + + - arbitrary code from arbitrary applications must not be run in the + context of the system preferences application; and + + - the system preferences application cannot be shipped with + manually-coded preferences windows for all applications which could + ever be installed. + +However, automatically generated UIs generally give a bad user +experience, due to the limited flexibility a designer has on them, so +are suitable only for basic preferences (such as toggle switches; see +[][Discussion of automatically generated versus manually coded preferences UIs]). +There may be cases where an application has a particular +preference which Apertis provides no widgets suitable for editing it. In +these infrequent cases, it must be possible for the system preferences +application to execute a stand-alone preferences window from the +application to set that particular preference. + +#### System preferences application + +If an application has preferences, it must give the path to the +GSettings schema file which defines them in its application manifest. + +The system preferences application should display a list of applications +as its initial screen, including entries for system preferences which it +implements itself. The applications listed should be the ones whose +manifests specify GSettings schema files, and the application name and +icon should also be retrieved from the application manifest and +displayed. + +If the user selects an application, a preferences window should be +displayed which shows all the preferences in the application’s GSettings +schema file. See [][Generating a preferences window from a GSettings schema file] +for details of how this is done. Note +that if the schema file defines multiple levels of schema, they should +be presented as a hierarchy of pages, with preferences only being shown +on leaf pages. + +As a system application, the system preferences application would have +permission to read and write any application settings via GSettings, so +forms part of the trusted computing base (TCB) for preferences. + +The vendor may choose the security policy for which users may edit +system preferences (such as the language or background) — they could +either allow all users to edit these, or only allow administrative users +(such as the vehicle owner) to edit them. If so, we recommend showing +the entries for these preferences anyway, but making the widgets +insensitive and presenting an authentication dialogue for the +administrator to authenticate with before allowing the settings to be +edited, see the [Multi-User Transactional Switching document](multiuser-transactional-switching.md). + +#### Per-application preferences windows + +If the vendor wishes to implement a user experience where each +application shows its own preferences window, this should be implemented +using the system preferences application in a different mode. A settings +button or menu entry in the application should launch the system +preferences application. + +It should support being launched with the name of a GSettings schema to +show, and it would render a preferences window from that schema (see +[][Generating a preferences window from a GSettings schema file]). +If the schema file defines multiple levels of schema, +they should be presented as a hierarchy of pages, with preferences only +being shown on leaf pages. It is up to the vendor whether the user can +navigate ‘up’ from the top level of the schema to a list of all +applications. + +As the system preferences application is part of the TCB for +preferences, it must not allow an application to launch it with the name +of a GSettings schema file which does not belong to that application. +For example, that would allow one application to trick the user into +editing their preferences for another application. + +#### Generating a preferences window from a GSettings schema file + +A GSettings [schema file][GSettings-schema] can be turned into a UI using the +following rules: + + - A \<schema\> element is turned into a preference page. If it has an + extends attribute, the widgets from the schema it extends are added + to the preferences page first. + + - The first non-relocatable \<schema\> element in a \<schemalist\> + will be taken as providing the preferences page for the application. + Subsequent \<schema\> elements will be ignored unless pulled in as + preferences sub-pages using a \<child\> element. + + - A \<child\> element is turned into an entry to show a preferences + sub-page for the corresponding sub-schema. The label for this entry + should come from a new (non-standard) label attribute on the + \<child\> element. + + - Relocatable \<schema\> elements (those without a path attribute) are + ignored unless pulled in as a preferences sub-page using a \<child\> + element. + + - A \<key\> element is turned into a widget with its label set from + the \<summary\> element and its description set from the + \<description\> element. The type of widget is set by the type + attribute, which specifies a [GVariant type]: + + - b (boolean): Switch or checkbox widget. + + - y, n, q, i, u, x, t (integers): Integer spin button. Its range + is set to the smaller of the bounds of the integer type or the + values of the \<range\> element (if present). + + - h (handle): Not supported. + + - d (double): Floating point spin button. Its range is set to the + smaller of the bounds of the double type or the values of the + \<range\> element (if present). + + - s (string): Text entry widget. If a \<choices\> element is + present, a drop-down box should be used instead, displaying the + options from the \<choice\> elements. + + - o (object path): Not supported. + + - g (type string): Not supported. + + - ? (basic type): Not supported. + + - v (variant): Not supported. + + - a (array): Not supported in any form. + + - m (maybe): Not supported in any form. + + - (), r (tuple): Not supported in any form. + + - {} (dictionary): Not supported in any form. + + - \* (any): Not supported in any form. + + - If a \<key\> element contains an enum attribute and no type + attribute, a drop-down box should be used, displaying the options + from the nick attributes of the \<value\> elements in the + corresponding \<enum\> element. + + - If a \<key\> element contains a flags attribute and no type + attribute, a checkbox list should be used, displaying a checkbox for + each each of the nick attributes of the \<value\> elements in the + corresponding \<flags\> element. + + - If a key’s name attribute matches a mapping to a wizard application + (see [][Support for custom preferences windows]) in the application’s manifest, that key should + be displayed as a menu entry which, when selected, launches the + wizard application as a new window. + +#### Support for custom preferences windows + +If an application has a particularly esoteric preference or set of +preferences which are not supported by the generated preferences UI (see +[][Generating a preferences window from a GSettings schema file]), +it may provide a ‘wizard’ application as part of its +application bundle which allows setting those preferences (and only +those preferences). For example, this could be used to show a ‘wizard’ +for configuring an e-mail account; or a map widget for selecting a +location. + +A wizard application presents a single window of preferences, and its +widgets cannot be integrated into a preferences window generated by the +system preferences application — it must be launched using a menu entry +from there. + +The wizard application must be listed in the application’s manifest as +part of a dictionary which maps GSettings schemas or keys to commands to +run. + +For example, a particular manifest could map the key +/org/foo/MyApp/complex-setting to the command my-app +--show-complex-setting. Or a manifest could map the schema +/org/foo/MyApp/EmailAccount to the command my-app +--configure-email-account. + +Application bundles which contain keys for this in their manifest should +be subjected to extra app store validation checks, to establish that the +wizard application’s UI is consistent with other preferences UIs, and +that it does not implement preferences which should be handled by a +generated UI. + +The wizard application must set the relevant preferences itself before +exiting, and runs with the same privileges as the rest of the +application bundle (so will only have access to that application’s +preferences, as per [][Security policy]). + +It may be necessary for the window manager to treat windows from wizard +applications specially, so that they appear more like a window which is +part of the system preferences application than a window from a separate +application. This can be solved by adding appropriate metadata to the +wizard application windows so the window manager treats them +differently. + +#### Searchability of preferences + +To allow the system preferences application to search over all +applications’ preferences ([][Searchable preferences]), it must load all the +GSettings schemas from applications whose manifests specify a schema. +Searching must be performed over the user-visible parts of the schema +(the \<summary\> and \<description\> elements), and results should be +returned as a link to the relevant application preferences window. +System preferences should be included in the search results too. + +#### Reorganising preferences + +Implementing arbitrary reorganisation of preferences ([][Rearrangeable preferences]) +is difficult, as that requires an OEM to know the semantics of all +preferences for all possibly installable applications. + +We recommend that if an OEM wants to present a new group of a certain +set of preferences, they must choose specific preferences from known +applications, and implement a custom window in the system preferences +application which displays those preferences. Each preference should +only be shown if the relevant application is installed. + +An alternative implementation which is more flexible, but which devolves +more control to application developers, is to tag each preference in the +GSettings schemas with well-defined tags which summarise the +preference’s semantics. For example, an application’s preference for +whether to submit usage data to the application data could be tagged as +‘privacy’; or a preference determining the colour scheme to use in an +application could be tagged as ‘appearance’. The OEM could then +implement a custom preferences window which queries all installed +GSettings schemas for a specific tag and displays the resulting +preferences. We do not recommend this option, as even with app store +validation of the chosen tags, this would allow application developers +too much control over the appearance of a system preferences window. + +#### Preferences list widget + +In order to help make all preferences UIs consistent (including those +implemented by the vendor, [][System preferences application]; and those implemented by +application developers as wizard applications, [][Per-application preferences windows]), Apertis +should provide a standard widget which implements the conversion from +GSettings schemas to UI as described in [][Generating a preferences window from a GSettings schema file]. + +This widget should accept a list of GSettings schema paths to display, +and may optionally accept a list of keys within those schemas to display +(ignoring the others), or to ignore (displaying the others); and should +display all those keys as preferences. It should implement reading and +writing the keys’ values using the GSettings API, and must assume that +the application has permission to do so (see [][Security policy]). It must check +for writability of preferences and make them insensitive if they are +read-only (see [](#vendor-lockdown1)). It cannot give the application more +permissions than it already has. + +If application developers use this widget, the vendor can ensure that +preferences UIs are consistent between applications and the system +preferences application through the theming of the widget. + +#### Vendor lockdown + +If the vendor locks down a key in a GSettings schema for an application +(or system preference) ([](#vendor-lockdown) — supported by [][Proxied dconf backend] +and [][Development backend], but not [][Key-file backend]), that is enforced by the underlying settings service +(most likely dconf), and cannot be overridden or worked around by +applications. + +However, it is up to applications to reflect whether a preference is +read-only (due to being locked down) in their UIs. This is typically +achieved by hiding a preference or making its widget insensitive. +Applications can use the [g_settings_is_writable] method to +determine whether a preference is read-only. Any preferences widgets +provided by Apertis ([][Preferences list widget]) must implement this already. + +If an application developer uses a custom widget to display a +preference, and forgets to check whether that preference is read-only, +their application might enter an inconsistent state (which is their +fault), but the system will not let that preference be written. +Convenience APIs like [g_settings_bind_writable] can reduce the +risk of this +happening. + +#### Discussion of automatically generated versus manually coded preferences UIs + +In an ideal world, our recommendation would be that: while automatically +generating preference UIs can rapidly produce rough drafts, in our +experience it can never result in a high-quality finished UI with: + + - logically grouped options; + + - correctly aligned controls; + + - a concept of which preferences are most important, which ones are + ‘advanced’, and which ones should be hidden; + + - conditional defaults (for example, when you set up IMAP e-mail, the + default port should be 143, except if you have selected old-style + SSL in which case it should be 993); and + + - the ability to hide or disable preferences that do not apply because + of the value of another preference (for example, if you switch off + Bluetooth completely, then the widget to change the name that is + broadcast over Bluetooth should be hidden or disabled). + +If the uniform appearance of preferences UIs is a concern, we believe +this should be addressed through: convention; the default appearance of +widgets in the UI toolkit; and the use of a set of human interface +guidelines such as the [GNOME HIG]. Specifically, we recommend that +preferences are: + + - integrated into the main application UI if there are only a small + number of them; + + - [instant-apply] unless doing so would be dangerous, in which + case they should be explicit-apply for all preferences in the + dialogue (for example, changing monitor resolutions is dangerous, + and hence is explicit-apply); and + + - grouped logically in the UI. + +If, after the preferences UIs of several applications have been +implemented, some common widget patterns have been identified, we +suggest that they could be abstracted out into new widgets in the UI +toolkit. The goal of this would be to increase consistency between +preferences UIs, without implementing essentially a separate UI toolkit +for them, which would be the result of any template- or +auto-generation-based approach. + +An alternative way of thinking about this is that preferences are +subject to a model–view split (the model is GSettings schema files; the +view is the preferences UI), and it is typically inadvisable to generate +a view from a model when following that pattern. + +However, we realise that the goal of having a unified system preferences +application with a consistent appearance (which is enforced) conflicts +with these recommendations, and hence these recommendations are not part +of our overall suggested approach. + +### Preferences hard key + +A preferences hard key must be supported as detailed in the Hard Keys +design. In a configuration where a system preferences application is +used, it must launch that application, already open on the preferences +window for the active application. If no application is active, or if +the currently active application has no GSettings schemas listed in its +manifest file, the main page of the system preferences application +should be shown. + +In a configuration where applications implement their own preferences +windows, the active application must be sent a ‘hard key pressed’ signal +for the preferences hard key, which the application can handle how it +wishes (i.e. by showing its preferences window). If there is no active +application, the system preferences application (which in this +configuration only contains system preferences) should be shown. + +The policy for exactly what happens in each situation and configuration +is under the control of the hard keys service, which is provided by the +vendor. It should have access to the manifest for the active application +so it can find information about GSettings schemas. + +### Existing preferences schemas + +As GSettings is used widely within the open source software components +used by Apertis, particularly GNOME, there are many standard GSettings +schemas for common user settings. We recommend that Apertis re-use these +schemas as much as possible, as support for them has already been +implemented in various components. If that is not possible, they could +be studied to ensure we learn from their design successes or failures. + + - org.gnome.system.locale + + - org.gnome.system.proxy + + - org.gnome.desktop.default-applications + + - org.gnome.desktop.media-handling + + - org.gnome.desktop.interface + + - org.gnome.desktop.lockdown + + - org.gnome.desktop.background + + - org.gnome.desktop.notifications + + - org.gnome.crypto + + - org.gnome.desktop.privacy + + - org.gnome.system.dns\_sd + + - org.gnome.desktop.sound + + - org.gnome.desktop.datetime + + - org.gnome.system.location + + - org.gnome.desktop.thumbnailers + + - org.gnome.desktop.thumbnail-cache + + - org.gnome.desktop.file-sharing + +Various Apertis dependencies (for example, Mutter, Tracker, libfolks, +IBus, Geoclue, Telepathy) use their own GSettings schemas already — as +these are not shared, they are not listed. + +*Alternative model:* If the locale is a system setting, rather than a +user setting, systemd's [localed] should be used. This would require +the locale to be changed via the localed D-Bus API, rather than +GSettings, which would affect the implementation of the system +preferences app. + +## Persistent data approach + +### **Overall architecture** + +As discussed in sections 5.3.1 and 7 of the Applications Design, and the +Multiuser Design, there is a difference between state which an app needs +to persist (for example, if it is being terminated to switch users), and +state which an app explicitly needs to share (for example, if a +transactional user switch is taking place to execute an action as a +different user). The Multiuser Design encourages app authors to think +explicitly about these two sets of state, and the differences between +them. It is the app which chooses the state to persist, rather than the +operating system — storage space is too limited to persist the entire +address space of an app, effectively suspending it. + +The state each app chooses to persist will differ, and cannot be +predicted by Apertis. There could be a lot of state, or very little. It +could be representable as a simple key–value dictionary, or might have a +complex hierarchical structure. + +### Well-known state directories + +As mentioned in the Applications Design document (sections 5.3.1 and 7), +we recommend that Apertis provide a per-(user, app) directory for +storage of persisted data, and a public API the app can call to find out +that directory. The API should differentiate between cache and non-cache +state, with cache state going in $XDG\_CACHE\_HOME/*net.example.MyApp*/ +and non-cache state going in $XDG\_DATA\_HOME/*net.example.MyApp*/. +Alternatively, as suggested in the Applications Design, the latter could +be /Applications/*net.example.MyApp*/Storage/*username*/state/. This has +the advantage of allowing all data for a particular app to be removed by +deleting /Applications/net.example.MyApp, at the cost of not following +the XDG standard used by most existing software. This fulfils the +factory reset requirement ([][Factory reset]). + +The former is effectively equivalent to a per-(user, app) +XDG\_CACHE\_HOME directory, and the latter to a XDG\_DATA\_HOME, as +defined by the [XDG Base Directory Specification][XDG-base-dir-spec]. + +AppArmor rules should exist to allow apps to write to these directories +(and not to other apps’ state directories). This is the extent of the +security needed, as state storage is simply an interaction between an +app and the filesystem. + +This approach automatically allows for rollback of persistent data +([][Rollback]) using the normal snapshotting mechanism described in +the Applications Design document. + +As with preferences, app bundles must be in charge of upgrading their +own persistent data when the system is upgraded (or the app is upgraded) +([][System and app bundle upgrades]). Recommendations are given in the subsections below. + +### Recommended serialisation APIs + +As each app’s state storage requirements are different, we suggest that +Apertis provide several recommended serialisation APIs, and allow apps +to choose the most appropriate one — or something completely different +if that fulfils their requirements better. + +Alongside, Apertis should provide guidelines to app developers to allow +them to choose an appropriate serialisation API, and avoid common +problems in serialisation: + + - minimise writes to main storage ([][Minimising io bandwidth]); + + - ensure all updates to stored state are atomic (requirement + [][Atomic updates]); and + + - ensure transactions are used for groups of updates where appropriate + ([][Transactional updates]). + +> Atomic in the sense that either the old or new states are stored in +> entirety, rather than some intermediate state, if power is lost +> part-way through an update. + +Depending on the requirements it is believed that apps will have, some +or all of the following APIs could be recommended for serialising state +to main storage. For comparison, Android only provides a generic file +storage API, and an SQLite API, with no implemented [key–value store +APIs][Android-persistent-data]. Apps must implement those +themselves. + +#### GKeyFile + +[*https://developer.gnome.org/glib/stable/glib-Key-value-file-parser.html*](https://developer.gnome.org/glib/stable/glib-Key-value-file-parser.html) + +Suitable for small amounts of key–value state with simple types. +Suitable for small amounts of data. + +All updates to a GKeyFile are atomic, as it uses the atomic-overwrite +technique: the new file contents are written to a temporary file, which +is then atomically renamed over the top of the old file. Transactional +updates can be implemented by saving the key file to apply the +transaction, and discarding the in-memory GKeyFile object to revert it. + +The amount of I/O with a GKeyFile is small, as the amount of data which +should be stored in a GKeyFile is small, and the file is only written +out when explicitly requested by the app. + +System upgrades have to be handled manually by app bundles — if the +persistence data format has to change, the app must migrate data from +the old format to the new format the first time it is run after an +upgrade. In this case, it is recommended that all GKeyFiles used for +persistent data contain a ‘Version’ key specifying the data format +version in use. + +#### GVDB + +[*https://git.gnome.org/browse/gvdb*](https://git.gnome.org/browse/gvdb) + +Memory-mapped hash table with [GVariant]-style types, suitable for +small to large amounts of data which are read much more frequently than +they are written. This is what dconf uses for storage. + +All updates to a GVDB file are atomic, as it uses the same +atomic-overwrite technique as [][GKeyFile]. Transactions are +supported similarly — by writing out the updated database or discarding +--- + +The amount of I/O for reads from a GVDB file is small, as it memory-maps +the database, so only pages in the data it actually reads (plus some +metadata). Writes require the entire file to be updated, but are only +done when explicitly requested by the app. + +GVDB supports per-file versioning (though this is not currently exposed +in the public API). This can be used for handling system upgrades +([][System and app bundle upgrades]) — the database must be explicitly migrated from an old +version to a new version when an upgraded app is first +started. + +#### SQLite + +[*http://sqlite.org/*](http://sqlite.org/) + +[*https://wiki.gnome.org/Projects/Gom*](https://wiki.gnome.org/Projects/Gom) + +Full SQL database implementation, supporting simple SQL types and more +complex relational types if implemented manually by the app. Suitable +for medium to large amounts of data which are read and written +frequently. It supports SQL transactions. + +SQLite is not a panacea. It is designed for the specific use pattern of +SQL databases with indexes and relational tables, with frequent reads +and writes, and infrequent deletions of data. Apps will only get the +best performance from SQLite by defining their own table structure, +indices and relations; imposing a common key–value-style API on top of +SQLite would give lower performance. + +SQLite has limited support for SQL schema upgrades with its [ALTER TABLE][SQLite-alter-table] +statement, which supports renaming tables and adding new columns +to tables. Apps must implement their own data migration from old to new +versions of their database schema; documenting this is beyond the scope +of this design. + +Apps should only use SQLite if they have considered issues like their +vacuuming policy — how frequently to vacuum the database after deleting +data from it. + See: + + - [*https://blogs.gnome.org/jnelson/2015/01/06/sqlite-vacuum-and-auto\_vacuum/*](https://blogs.gnome.org/jnelson/2015/01/06/sqlite-vacuum-and-auto_vacuum/) + + - [*https://wiki.mozilla.org/Performance/Avoid\_SQLite\_In\_Your\_Next\_Firefox\_Feature*](https://wiki.mozilla.org/Performance/Avoid_SQLite_In_Your_Next_Firefox_Feature) + +If using GObjects to represent entries in an SQLite database, the +[GOM] wrapper around SQLite may be useful to simplify code. + +#### GNOME-DB + +[*http://www.gnome-db.org/*](http://www.gnome-db.org/) + +This is **not** recommended. It is an abstraction layer over multiple +SQL database implementations, allowing apps to access remote SQL +databases. In almost all cases, directly using [][Sqlite] is a more +appropriate choice. + +### When to save persistent data + +As specified in the Applications Design (section 5.3.1), state is saved +to main storage at times chosen by both the operating system and the +app. The operating system knows when the logged in user is about to +change, or when the system is about to be shut down; the app knows when +it has changed some of its persistent state in memory, and hence needs +to write it out to main storage. + +An action could be implemented in each app which is triggered by the +ActivateAction method of the org.freedesktop.Application [D-Bus +interface][DBus-desktop-entry] if, for example, that interface is implemented by apps. +When triggered, this action would cause the app to store its persistent +state. + +### Recently used and favourite items + +Section 6.3 of the Global Search Design specifies that an API for apps +to store their favourite and recently used items in will be provided. As +this is data shared from an app to the operating system, and is +typically append-only rather than strongly read–write, we recommend that +it be designed separately from the persistent data API covered in this +document, following the recommendations given in the Global Search +Design document. + +## Summary of recommendations + +As discussed in the above sections, we recommend: + + - Splitting preferences, persistent data storage and confidential data + storage ([][Approach]). + + - Providing one API for preferences: GSettings ([][Overall architecture]). + + - Apps provide a GSettings schema file for their preferences, named + after the app ([][Overall architecture]). + + - Existing GSettings schemas are re-used where possible for user and + system settings ([][Existing preferences schemas]). + + - Using the normal GSettings approach for handling app upgrades + ([][Overall architecture]). + + - Developing against the normal dconf backend for GSettings (section + [][Development backend]. + + - Switching to the proxied dconf backend once it’s ready, to support + access control ([][Proxied dconf backend]). + + - A key-file backend is an alternative we do *not* recommend ([][Key-file backend]). + + - Permissions to modify user or system settings are controlled by the + app’s manifest ([][Security policy]). + + - Permissions are converted to backend-specific AppArmor rules by the + app store ([][Security policy]). + + - User interfaces for preferences are provided by the vendor, + automatically generated from GSettings schemas; or provided by + applications ([][User interface]). + + - Apertis provides a standard widget to present GSettings schemas as a + preferences UI ([][Preferences list widget]). + + - Preferences hard key support is added according to the Hard Keys + design ([](#preferences-hard-key)). + + - Providing API to get a persistent data storage location ([][Well known state directories]). + + - Persistent data is private to each (user, app) pair ([][Well known state directories]). + + - Recommending various different data storage APIs to suit different + apps’ use cases ([][Recommended serialisation APIs]). + + - Apps explicitly define which data will persist, and are responsible + for saving it and migrating it from older to newer versions ([][Overall architecture]). + + - Apps can be instructed to save their persistent state by the + operating system via a D-Bus interface ([][When to save persistent data]). + + - User secrets and passwords are stored using the freedesktop.org + Secrets D-Bus API, not the Apertis preferences or persistence APIs + ([][Approach]). + +[GSettings]: https://developer.gnome.org/gio/stable/GSettings.html#GSettings.description + +[GSettings-relocatable]: https://developer.gnome.org/gio/stable/GSettings.html#gsettings-relocatable + +[AppArmor]: http://apparmor.net/ + +[TTCTTOU]: http://en.wikipedia.org/wiki/Time_of_check_to_time_of_use + +[dconf]: https://developer.gnome.org/dconf/unstable/dconf-overview.html + +[dconf-lockdown]: https://developer.gnome.org/dconf/unstable/dconf-overview.html#id-1.2.7 + +[XDG-base-dir-spec]: http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html + +[Secret Service]: http://standards.freedesktop.org/secret-service/ + +[Secret Service API]: http://standards.freedesktop.org/secret-service/pt02.html + +[Android-shared-pref]: http://developer.android.com/guide/topics/data/data-storage.html#pref + +[Android-prefs-def]: http://developer.android.com/guide/topics/ui/settings.html#DefiningPrefs + +[Android-pref-fragment]: http://developer.android.com/guide/topics/ui/settings.html#Fragment + +[Android-persistent-data]: http://developer.android.com/guide/topics/data/data-storage.html + +[Android-activity]: http://developer.android.com/training/basics/activity-lifecycle/recreating.html + +[Android-account-API]: http://developer.android.com/reference/android/accounts/AccountManager.html + +[Android-settings-question]: http://stackoverflow.com/questions/785973/what-is-the-most-appropriate-way-to-store-user-settings-in-android-application/786588#786588 + +[iOS-key-values]: https://developer.apple.com/library/ios/documentation/CoreFoundation/Conceptual/CFPreferences/CFPreferences.html#//apple_ref/doc/uid/10000129-SW1 + +[iOS-pref-domains]: https://developer.apple.com/library/ios/documentation/CoreFoundation/Conceptual/CFPreferences/Concepts/PreferenceDomains.html + +[iOS-prop-lists]: https://developer.apple.com/library/ios/documentation/CoreFoundation/Conceptual/CFPropertyLists/CFPropertyLists.html#//apple_ref/doc/uid/10000130i + +[iOS-pref-practices]: https://developer.apple.com/library/ios/documentation/CoreFoundation/Conceptual/CFPreferences/Concepts/BestPractices.html#//apple_ref/doc/uid/TP30001219-118191 + +[iOS-NSUserDefaults]: https://developer.apple.com/library/ios/documentation/Cocoa/Reference/Foundation/Classes/NSUserDefaults_Class/index.html#//apple_ref/occ/cl/NSUserDefaults + +[iOS-system-defaults]: https://developer.apple.com/library/ios/documentation/Cocoa/Conceptual/UserDefaults/Preferences/Preferences.html#//apple_ref/doc/uid/10000059i-CH6-SW6 + +[iOS-app-dirs]: https://developer.apple.com/library/ios/documentation/FileManagement/Conceptual/FileSystemProgrammingGuide/AccessingFilesandDirectories/AccessingFilesandDirectories.html#//apple_ref/doc/uid/TP40010672-CH3-SW11 + +[iOS-graph-management]: https://developer.apple.com/library/prerelease/ios/documentation/DataManagement/Devpedia-CoreData/coreDataOverview.html#//apple_ref/doc/uid/TP40010398-CH28 + +[iOS-prop-nesting]: https://developer.apple.com/library/mac/documentation/Cocoa/Conceptual/PropertyLists/AboutPropertyLists/AboutPropertyLists.html + +[iOS-full-state-persist]: https://developer.apple.com/library/ios/featuredarticles/ViewControllerPGforiPhoneOS/PreservingandRestoringState.html + +[iOS-keychain]: https://developer.apple.com/library/ios/documentation/Security/Conceptual/keychainServConcepts/01introduction/introduction.html#//appl_ref/doc/uid/TP30000897-CH203-TP1 + +[GENIVI-peristence]: http://docs.projects.genivi.org/persistence-client-library/1.0/Persistence_ArchitectureManual.pdf + +[GENIVI-relational-API]: http://docs.projects.genivi.org/persistence-client-library/1.0/Persistence_ClientLibrary_UserGuide.pdf + +[D-Bus-secret-service]: http://standards.freedesktop.org/secret-service/ + +[appservice]: https://git.collabora.com/cgit/user/xclaesse/appservice.git + +[GSettings-schema]: https://git.gnome.org/browse/glib/tree/gio/gschema.dtd + +[GVariant type]: https://developer.gnome.org/glib/stable/glib-GVariantType.html#id-1.6.18.6.9 + +[g_settings_is_writable]: https://developer.gnome.org/gio/unstable/GSettings.html#g-settings-is-writable + +[g_settings_bind_writable]: https://developer.gnome.org/gio/stable/GSettings.html#g-settings-bind-writable + +[GNOME HIG]: https://developer.gnome.org/hig/stable/dialogs.html.en + +[instant-apply]: https://developer.gnome.org/hig/stable/dialogs.html.en#instant-and-explicit-apply + +[localed]: http://www.freedesktop.org/wiki/Software/systemd/localed/ + +[GVariant]: https://developer.gnome.org/glib/stable/glib-GVariant.html + +[SQLite-alter-table]: https://www.sqlite.org/lang_altertable.html + +[GOM]: https://wiki.gnome.org/Projects/Gom + +[DBus-desktop-entry]: http://standards.freedesktop.org/desktop-entry-spec/desktop-entry-spec-latest.html#dbus diff --git a/content/designs/procurement.md b/content/designs/procurement.md new file mode 100644 index 0000000000000000000000000000000000000000..a303ffc6b6bfc677b47a809904ba3a43c5b56b2d --- /dev/null +++ b/content/designs/procurement.md @@ -0,0 +1 @@ +# Apertis Procurement diff --git a/content/designs/release-flow.md b/content/designs/release-flow.md new file mode 100644 index 0000000000000000000000000000000000000000..f8e32037431a52ecf47602644929185ef28a27cf --- /dev/null +++ b/content/designs/release-flow.md @@ -0,0 +1,674 @@ +--- +title: Release flow and product lines +authors: + - name: Sjoerd Simons +--- + +# Introduction + +Apertis and its direct downstreams are intended as baseline distributions for further +product development, as such it's important to have a clear definition of what +downstreams further down the chain can expect in terms of releases and support cycles +in order to understand how to best use them in their +product development cycles. + +The release cycles of Apertis and its direct downstreams are split up in two big phases: a development +phase, containing various development releases followed by a product phase +which contains various stable point releases. As it is typical, the development +phase is where new features are introduced and prepared, with each development +release having only a relatively short support time, while during the product phase the +focus is on stability, which comes with a longer support cycle, no new feature +and only updates for important bugfixes and security issues. + +This document sets out to define a well-defined process for both the development +and production phases of Apertis and its direct downstreams, while +ensuring the software taken from upstreams is recent and well-supported. More +specifically this process is trying to balance various trade-offs when +integrating from community supported upstreams: +* support baseline versions that also have community support + (to prevent the situation where, for instance, Apertis would needs to provide + full security support for the base distribution and/or the Linux kernel); +* ensure there is a reasonable window for users of Apertis and its direct downstreams to + rebase on top of a new on version while the older baseline is still supported; +* limit the amount of simultaniously supported releases to mimimize the + overall effort. + +In all cases it should be noted that support timelines documented here are the +expected default timelines: given enough interest particular support +cycles can be extended to fit the needs of downstreams. + +For the Apertis releases there are two important upstream projects that need to be +taken into account: the Debian project, which is the main upstream +distribution for Apertis, and the mainline Linux kernel. These will be +further looked at first, including the impact of their release process on +generic downstreams before looking at Apertis specifically. + +# Debian release processes + +Debian aims to do a new major release about every two years. These releases are +_not_ time-based, but done when "ready" (defined as having no more issues +tagged "release-critical"). Even +so, the process is well understood and predictable. For more information see the +[Debian release statistics](https://wiki.debian.org/DebianReleases#Release_statistics) + +For a downstream there are two important processes to understand. The first one +to understand is the process towards a release which impacts when downstream +rebasing should start. The second one being the maintenance process of a +stable release, which impacts how to handle security and bugfixes coming from +Debian to the downstream. + +A new stable Debian release is done roughly every two years. +Each release gets 3 years of support before it is taken over by the +LTS team which provides other two years of security support before a release +enters end of life (EOL). The following diagram shows the expected timeline for +the current Debian release and the upcoming releases: + + + + +## Process towards a release + +Debian's development is done in a suite called `unstable` (code-named `sid`). +Developers directly upload packages into this suite. Once updated, packages +stay in the `unstable` suite for some time (typically 10 days) and then they +automatically get promoted to the `testing` suite as long as no +release-critical bugs were found (and no other sanity check failed). The +`testing` suite has the code-name of the *next* planned Debian release, at the +time of this writing this is `buster`. + +The idea behind the `unstable` to `testing` progression is to ensure that during +Debian development there is a version available that is shielded from the most +serious regressions and can thus be used by a wider audience for testing and +dogfooding. However among Debian developers it is common to directly run +`unstable` on a day to day basis. + +To go from the "normal" development to a new release a freeze process is used. +Specifically the `testing` suite is frozen in various stages: +* transition freeze: no updates that need a collection of packages to + transition into `testing` at once are allowed (e.g. due to ABI breakage); +* soft freeze: no new packages are allowed into testing anymore; +* full freeze: only updates for release critical issues are allowed. + +Typically this process takes around 7 months (plus/minus two months) to complete, +with the transition freeze and soft freeze each taking about 1 month while the +full freeze takes the remainder of the time. Even with the `testing` suite being +held in a pretty stable state the final freeze takes this amount of time due to +the sheer size of Debian, due to the big increase in user testing once the freeze +begins and due to all the work that needs to be completed before release, +such as finalising the documentation, installers, etc. The end-result is a new stable +release of a very high-quality Linux distribution. + +Once a release is done the `stable` suite is updated to refer to the new release, +while `testing` is changed to refer to the next version (to be code-named +`bullseye` at the time of writing). + + +From the perspective of a downstream distribution such as Apertis it is +important to note that even if during the Debian freeze there will be some +amount of outstanding release-critical bugs, only a subset of them will impact the +downstreams use-case. As such, if scheduling allows, it is recommended +to start rebasing on top of a *next* Debian stable release while Debian itself +is in either soft or hard freeze. This has the added benefit that the +downstream distribution will already pre-test the upcoming Debian release, with +the potential of being able to fix high-priority issues in Debian proper even +before its release, thus lowering the delta maintained in the +downstream distribution. + +## Process after release + +Once a release has been done, the newly released distribution will follow Debian's +stable processes. Debian tends to do point release once every two months +to include fixes for the latest security issues and high priority bugs. +This process is handled through various different package repositories. + +### Stable repository + +This is the main repository with the full current *released* version of Debian. +After release this repository only gets updated when a point releases happens. + +### Security repository + +This repository contains security updates on top of the current point release. +The security repositories are managed by the Debian Security team, using their +own dedicated infrastructure. + +As can be expected, security updates are meant to be deployed by users as soon +as possible. + +### Stable Proposed Updates repository + +This repository is meant for *proposed* updates to the next point release. The +purpose of this repository is to have a way of testing updates before they are +included into the next point release. + +Only packages with issues tagged release-critical will be included in this +repository, including both bugfixes and security fixes. Do note that +packages with security fixes are immediately published in the security repository +for consumption by end-user and the inclusion in the proposed update repository is +purely so that they can be included as part of the next point release. + +The set of packages that actually end up in the point release is manually +reviewed and selected +by the Debian Stable Release maintainers, thus there is no guarantee that +packages in this repository will be part of the next point release. + +### Stable Updates repository + +The `stable-updates` repository exists for updates proposed to stable which are +high urgency or time-sensitive and thus should be generally available to users +before the next point release. Typical examples of packages landing here are +updates to timezone data, virus scanners and high impact/low risk bugfixes. + +All packages here will also be available in proposed updates and +are only allowed into this repository on a case-by-case basis. + +As with security updates this repository is meant to be used by all the users of +a Debian stable release. + +### Backports repository + +The backports repository contains packages taken from the *next* Debian release +(specifically from the testing suite) and rebuilt against the current Debian +stable release. Backports allow users to upgrade specific interesting packages +to newer versions while keeping the remainder of their system running the +stable release. + +However, while backports will have seen a minimal amount of testing, the packages +are provided on an as-is basis with no guarantee of stability. As such it's +recommended to only cherry-pick the package one needs from this repository. + +## Debian release flow conclusions + +From a purely downstream perspectives there are various interesting aspects in +this process. + +In the process going towards a release it's notable that even +during the soft and hard freeze periods Debian is already a quite stable +baseline as such a rebasing process for an Apertis product release can start +when Debian is in freeze as long as there is enough time left before the +product release (around 8 to 9 months). + +After a Debian release there are clear repositories that a downstream +should focus upon, namely those in the "stable updates" and "security" +repositories, as well as updates included in point releases. The "stable +proposed updates" can mostly be ignored on a day to day basis but gives +interesting insights in what can be expected from the next point release. Finally +the backports repository should in general not be used unless a downstream +has a high interest in versions of a package newer than what is available in the +stable release. However, in that case extra effort should be put in place to +track security issues and other bugfixes for that package as Debian only +provides it on a best-effort basis without the usual guarantees. + + +# Linux kernel release flow + +Apertis is following the Linux kernel LTS releases to ensure it includes +modern features and support for recent hardware. As such it's important to also +look at the release flow of the Linux kernel itself and its impact. Linux sees +a new major release about every 2 months, which typically is only supported +until the next major release happens. However once a year there is a long-term +support release which is supported for 2 years. + +The following diagram shows the expected timelines for the current and next +expected Linux long term stable releases. + + + + +## Process towards a release + +The kernel stabilisation process has two big phases: after every release there +is a two week *merge window* in which all the various changes +lined up by the various subsystem maintainers are pulled in the main tree. +At the end of this two-week +period the first release-candidate (rc1) is released and the merge window is +closed. Afterwards only patches fixing bugs and security issues +will be integrated, with a new release candidate coming out every week. + +Typically 7 or 8 release candidates will be released in each cycle followed by +a final release, which means a new stable version of Linux release every 9 to +10 weeks. + +## Process after a release + +After each Linux release further maintenance is done in the stable git tree. +These trees will only get further bug and security fixes, with releases +being done on an as-needed basis. The support time depends on the specific +release which fall in two categories: +* normal release, only supported until the next release; +* long term release, typically supported for two years. + +Currently each last kernel release of the year is expected to be a long term +release, supported for at least two years after release. Specific releases may +be provided with longer upstream support depending on industry interest. For +example the 4.4 kernel is getting a total of 6 years of support mainly due to +interest from Android. Similarly the Linux 3.16 kernel is also getting a total +of 6 years of support as that was the kernel used by the Debian Jessie release. +For Linux 4.9 a similar longer cycle is to be expected as that was used in +Debian Stretch, however that hasn't been made official thus far +and at the time of this writing Linux 4.9 will go EOL in January 2019. + +## Linux release flow conclusions + +For usage in Apertis product releases only long term releases are +suitable. As there is a yearly LTS release of Linux with only a 2 year support +cycle, it is recommended to ensure each yearly release of Apertis has the +latest Linux LTS support. This ensures both support for recent hardware as well +as having a reasonable security support window. + +If downstream projects require a longer support period for a specific kernel +release then it's recommended to align with other long term support efforts +instead, depending on requirements. + +# Apertis release flow + +The overall goal is for Apertis to do a yearly product release. These releases +will be named after the year of the stable release, in other words the product +release targetted at 2020 will be given major version 2020. A product release +is intended to both be based on the most recent mainline kernel LTS release and +the current Debian stable release. Since Debian releases roughly once every +two years, that means that there will typically be two Apertis product releases +based on a single Debian stable release. With Linux doing an LTS release on a +yearly basis, each Apertis product release will be based on a different (and +then current) Linux kernel release. + +To move to a yearly product release cycle the recommendation is to keep the +current quarterly releases, but rather than treating all the releases equally +as is today have releases with specific purposes depending on where in the yearly cycle the +releases are for a specific product release. + +The final product release is planned to occur at the end of Q1 every year, both to +avoid the impact of the major holiday periods (Christmas/new-year and european +summer) as well as releasing close to the Linux kernel LTS release to maximize the +use of its support cycle. Once a product release is published, it will continue +to get updates for bug and security fixes, with a point release every quarter +for the whole duration of the support period. + +The standard support period for Apertis is 7 quarters. In other +words from the initial release at the end of Q1 until the end of the *next* +year. + +The various types of releases per quarter (without point releases) would be: + +| Quarter | Release type | Support | +|---------|-------------------------|----------------------------------------------| +| Q4 | Release N-1 Preview | Limited, until the Q1 product release | +| Q4 | Release N Development | Limited, until the Q1 development release | +| Q1 | Release N-1 Product | Full support, until 1.75 years after release | +| Q1 | Release N Development | Limited, until the Q2 development release | +| Q2 | Release N Development | Limited, until the Q3 development release | +| Q3 | Release N Development | Limited, until the Q4 development release | +| Q4 | Release N Preview | Limited, until the Q1 product release | +| Q4 | Release N+1 Development | Limited, until the Q1 development release | +| Q1 | Release N Product | Full support, until 1.75 years after release | +| Q1 | Release N+1 Development | Limited, until the Q2 development release | + +For each quarter the releases would be (with some examples): + +| Quarter | N-2 | N-1 | N | N+1 | v2020 | v2021 | v2022 | v2023 | +|---------|-----|-----|-------|------|---------|------------|------------|------------| +| Q4 | .3 | pre | dev0 | | v2020.3 | v2021.pre | v2022.dev0 | | +| Q1 | .4 | .0 | dev1 | | v2020.4 | v2021.0 | v2022.dev1 | | +| Q2 | .5 | .1 | dev2 | | v2020.5 | v2021.1 | v2022.dev2 | | +| Q3 | .6 | .2 | dev3 | | v2020.6 | v2021.2 | v2022.dev3 | | +| Q4 | .7 | .3 | pre | dev0 | v2020.7 | v2021.3 | v2022.pre | v2023.dev0 | +| Q1 | | .4 | .0 | dev1 | | v2021.4 | v2022.0 | v2023.dev1 | +| Q2 | | .5 | .1 | dev2 | | v2021.5 | v2022.1 | v2023.dev2 | +| Q3 | | .6 | .2 | dev3 | | v2021.6 | v2022.2 | v2023.dev3 | + +The following diagram shows how this would look for Apertis releases up to +2023: + + +Further details about the various types of release will be given in the +following sections. + +## Flow up to a product release + +The main flow towards a quarterly release will remain the same as it now, +which is documented on the +[Apertis Release schedule](https://wiki.apertis.org/Release_schedule) page. +However, depending on the type of release the focus may differ. + +### Development releases (Q4, Q1, Q2, Q3) + +For a development release, everything is allowed as the main focus is +development. These can include bigger changes to the infrastructure as well as +to the delivered software stack. At the end of every quarter there is an Apertis +development release: this ensures that there can be ongoing development of the +distribution even if the preperation for the next product release has entered a +stabilisation phase. + +Rebasing on the upcoming stable version of Debian can only be done as part of a +development release. The rebase can start in a quarter as soon as Debian hits +the soft freeze stage. + +Development releases are versioned as `development number`, with numbering +starting from 0. The version of the first development release for the 2020 +product release would be `Apertis 2020 development 0` or optionally shortened +to `v2020dev0`. + +### Preview release (Q4) + +The goal of a preview release is to provide a preview of what will be the +final product release for further testing and validation by downstreams. As such +a preview release should achieve a high level of stability: this means that +during a preview release cycle only non-disruptive software or infrastructure +updates will be allowed. Similarly, new features can only be introduced if they +posw a low risk on existing functionality and do not have an impact on the overall +platform stability. + +During the preparation of a preview release extra focus should be given to +bugfixing and testing. + +One important exception to the above considerations is to be made: preview releases +should be released with the new Linux kernel LTS (either the final release or a +release candidate) to ensure the product release will be done with the most +recent LTS Linux kernel, maximising the overlap with the 2 year stable support +period offered. + +As there is only one preview release for each product release, the version is +the major product version followed by preview. For example `Apertis 2020 +preview`, which can be shortened to `v2020pre`. + +### Product release (Q1) + +As can be expected the focus of the product release quarter is to deliver a +high-quality release which can be supported for a longer period. For this +release only security fixes, bugfixes and updates to the stable kernel release +or updates from the Debian stable release. + +New features should not be included during this quarter as it's +unlikely there will be enough time for them to fully mature. + +The major version of the product release is simply the year in which the +release is to be done. The minor version starts at 0 and is increased for each +later point release. This means the initial product release for 2020 would be +`Apertis 2020.0` or simply shortened to `v2020.0`. + +## Process after a product release + +After a release has been done, for each of them there is an expected support +life depending on the type of release as outlined above. + +For non-product release any post-release updates will directly go into the main +repository for that specific release. For product releases a setup similar to +Debian is to be used to stage updates before a new point release is done. The +repositories used by Apertis are outlined in the following sections. + +### Stable Repository + +This is the main repository with the full *released* version. This repository +only gets updated at point releases. + +Point release will be done every three months. All downstreams are expected to +pull directly from the stable repository. + +### Security repository + +For security issues a dedicated security repository is used. This repository +is only used with updated packages including security fixes. + +This repository should be pulled directly by all downstreams and any updates +rolled out at high priority. Updates from the Debian security repository will +always be included in this repository. + +### Updates repository + +This repository includes updated packages to be included in the next Apertis +point release. Only packages with high priority bugfixes are allowed into this +repository. Updated packages from the Debian stable-updates and point releases +will be automatically included. + +Downstreams are recommended to include this repository but it's not mandatory. + +### Backports repository + +This repository has backports of packages which are of special interest to +downstreams but where not suitable for inclusion into the product release. + +Unless specific agreements have been made, the packages available in this +repository are for experimentation use only and are not supported as part of the +produce release. + +## Example images + +Apertis includes a big collection of packages which can be used in a variety of +system use-cases. As it is impossible to test all combinations of packages, Apertis +provides a set of example images for each type of system which has been +validated by the Apertis project. While other use-cases can be supported there +cannot be a strict guarantee that Apertis is fit for purpose for those as it +hasn't been validated in that situation. + +Furthermore, as these Apertis images are meant as examples for product use-case +they can include demonstration quality software, which is not intended nor has +been validated to form the basis of a product. + +To clarify what is expected to be supported for each Apertis product release +documentation will be provided to explain what the scope of each example image +is, which use-cases it validates and which part of the software stack are +fully supported for product usage. + +For the 2019 product release this document can be found in the +["Release artifacts for Apertis v2019"](release-v2019-artifacts.md) document. + +## Apertis release flow conclusions + +The above sections outline a process for Apertis to both generate and support +yearly product releases. They ensure that Apertis releases are always based on +recent but mature upstream software. Each product release will include the +very latest Linux LTS kernel as well as the current Debian stable release. + +What was intentionally not covered is how to manage forward looking development +during the non-development cycles as this is separate from the release flow. +However there is no real blocker for doing development not intended to be part +of the product release, deliverables can be delivered for instance +via the backports repository or by other means to be defined further. + +Combining all the various types of releases, for a single product release 13 +different releases will be done. For example for Apertis 2021 the schedule +looks like this: + +| Quarter | Release | Name | Type | +|---------|----------------------------|-----------|---------------------------| +| 2019Q4 | Apertis 2021 development 0 | v2021dev0 | development | +| 2020Q1 | Apertis 2021 development 1 | v2021dev1 | development | +| 2020Q2 | Apertis 2021 development 2 | v2021dev2 | development | +| 2020Q3 | Apertis 2021 development 3 | v2021dev3 | development | +| 2020Q4 | Apertis 2021 preview | v2021pre | preview | +| 2021Q1 | Apertis 2021.0 | v2021.0 | stable release | +| 2021Q2 | Apertis 2021.1 | v2021.1 | stable point release | +| 2021Q3 | Apertis 2021.2 | v2021.2 | stable point release | +| 2021Q4 | Apertis 2021.3 | v2021.3 | stable point release | +| 2022Q1 | Apertis 2021.4 | v2021.4 | stable point release | +| 2022Q2 | Apertis 2021.5 | v2021.5 | stable point release | +| 2022Q3 | Apertis 2021.6 | v2021.6 | stable point release | +| 2022Q4 | Apertis 2021.7 | v2021.7 | stable point release | + +For projects using Apertis (or its direct downstreams) given this schedule there is a rebase window +of a year to move to the newer version. Starting from when the preview release +of the new version is done (for instance, v2022pre in 2021Q4) until the .7 stable +point release of the old version (for instance, v2021.7), which is end of Q4 to +end of the next Q4. + +# Release flow for the direct downstreams of Apertis + +The release cycle of the direct downstreams of Apertis is expected to follow the same process as that of +Apertis. In other words throughout the year the direct downstreams of will do two development +releases based on top of the Apertis development release, one preview release +and a final product release. + +It is expected that the respective direct downstream releases will be done within a month from +the quarterly Apertis release and will be made available to the downstreams further down the chain in that +time frame. + +For an direct downstream product release it is expected that in addition to the `stable` +repository the `updates` and especially `security` repository are tracked closely, +with any updates from Apertis being made available in the direct downstream within a week. A similar +time-frame is expected for Apertis point releases. + + +# Guidelines for product development on top of Apertis and its direct downstreams + +To make the best use of Apertis in product development it is recommended to take +the release timelines of Apertis and its direct downstreams into account when creating a product release +roadmap. Since Apertis and its direct downstreams have a cadence of a new release once a year, users +are driven to the same cadence by default. Given that the overlap of stable +releases for two subsequent product releases is three quarters, users have a +full year to rebase their work once the preview release for the next product +release is published. + +The details about the use of Apertis and its direct downstreams will depend on the phase of the project, +in particular whether it is in the pre-production development phase or in the +post-production support phase. + +## Pre-production guidelines + +The pre-production phase is the phase before a new major version of software +goes into production. This can either before the product starts its production +or when a new major software update is planned to be rolled out to products +already in the field. + +Typically this phase consists of a period of heavy development (potentially +interleaved with short stabilisation periods), followed by a potentially longer +final stabilisation period before entering production. + +For the final stabilisation phase, the baseline used for Apertis and its direct downstreams should +be focused on stability. This means either a preview or the current product +release should be used. Care should be taken to ensure that there is still a +reasonable window of support for the baseline distribution when production is +planned to start. After production has started the guidelines for +post-production support should be taken into account. + +For the initial development phase there are two main options: +* follow the development reasonable of Apertis or its direct downstreams; +* follow the product releases of Apertis or its direct downstreams (switching at the preview stage). + +The first option allows the product development to use the very latest Apertis +features and developments on top of the most recent software baseline which will +form the basis of the future product release of Apertis or of its direct downstream, while the second +option provides a more stable, but older, baseline allowing the product team +to focus on their own software stack. These approaches can be +mixed, for example by starting out early product developement on the current +Apertis (or one of its direct downstreams) development release to take advantage of more recent features, +but following that baseline when it becomes the product release instead of +moving to the next cycle of development releases. By mixing the approaches in +this way the product team has the flexibility of choosing the +baseline that best fits their priorities at any given time. + +The following diagram shows an example of such a mixed development: +development starts on top of the then current Apertis development +release and is rebased early onto the next development versions of Apertis +such that the products final 9 month freeze before SOP coincides with the +product-line release of the Apertis it's based on. If a product is based +on a direct downstream of Apertis, then the chart would be nearly identical, +replacing the Apertis labels with the name of the direct downstream. + + + + +## Post-production support guidelines + +The post production support phase is the phase where the product is out in +the market and any software updates are primarily done for the purpose of fixing +bug and security issues. + +In this phase it's assumed that the release into the field has been done based +on a product release of Apertis or of one of its direct downstreams. The product team is expected to track +Apertis security fixes as they become available through the +security repository of Apertis or its direct downstream as well as new point releases (containing both security and +bug fixes). + +It is up to the product team to further select and test these updates for their +product and schedule software updates that work best for their schedule, with +the recommendation to update devices in the field as quickly as possible +especially in the case of high impact security fixes. + +When a new release of Apertis or of its direct downstream comes out the product team is expected to +update to this new version before the support for the previous Apertis release +comes to an end. It is typically recommended to start the work to rebase on +the new version of Apertis or of its direct downstream when the preview release becomes available as the +focus for Apertis is very much on stability at that point. + +The following diagram shows an example of such a flow, where the product +begins the preparation for deploying an update based on the new Apertis +version at the time of the preview release and targets deployment in the field +when the old Apertis release support ends, which gives a window of a full +year to do the necessary preparation and validation before deploying an update +into the field. If a product is based +on a direct downstream of Apertis, then the chart would be nearly identical, +replacing the Apertis labels with the name of the direct downstream. + + + +## Product guideline conclusions + +As can be seen in the previous sections Apertis and its direct downstreams try to give product +teams flexibility to use Apertis as they see fit for their needs within +the constraints imposed by the support timelines. + +It should be noted however that these timelines are not set in stone: if +there are business cases for having specific releases of Apertis or of its direct downstreams supported for +an extended period then this is in principle possible. However it should be +noted that Apertis and its direct downstreams in turn have constraints from its upstreams to be able +to rely on community support, which may limit the amount of support that can be +provided. + +# Appendix: Change in release strategy + +This release flow concept is a departure from the initial concept for Apertis, +which would rebase on every new Ubuntu releases (once every 6 months). This +resulted in two releases for every Ubuntu version, where in one quarter the +project would rebase on +the new Ubuntu release, and in the following quarter it would continue on that +baseline with further updates and improvements. + +Conceptually there are two big changes with this new concept: +* switch to a longer supported distribution release; +* switch from Ubuntu as a baseline to Debian. + +When the initial concept was set out, Ubuntu would support non-LTS releases for +18 month (one year after the *next* Ubuntu release). Currently however the +support for non-TLS releases is only 9 months (3 months after the *next* Ubuntu) +release), which is simply too short for supporting product usage even if the +product has a very aggressive timeline. + +This means that to fit the trade-offs/constraints mentioned in the introduction a +switch has to be made to releases with a longer support term, which in both +Ubuntu and Debian cases are released every 2 years, with 5 years of support. + +The rationale for switching from Ubuntu as a baseline to Debian has been +outlined in more detailed in the +["The case for moving to Debian stretch or Ubuntu 18.04"](case-for-moving-to-debian.md) +concept document. + +# Appendix: Distribution "freshness" + +A side-effect of the switch to distributions with a longer support +cycle is that there are fewer updates on top of the baseline. As such +the software available in the distribution can be older than the latest and +greatest from upstream or more recent distribution releases (for instance, +older than what it is available +in normal Ubuntu releases), which also means that not all the latest features might be +available. + +This is a consequence from the trade-offs that are being made in the release +process to best serve users of Apertis and its direct downstreams, stability and community support +are preferred over having the very latests features. In case newer features are +required this can either be handled via the backports mechanism if only +needed for specific users or, in case of a feature useful to most users, +including a newer version in the next release of Apertis or of its direct downstreams can be considered. + +A practical example of this happening is the way the Linux kernel +is handled, as support for recent hardware devices is +considered important for a wide variety of users (especially during the early +product phases). However this does mean a reduced community kernel support +timeline when compared to a distribution kernel, so in situations where an +update is considered, care should be taken to evaluate the trade-offs with respect +to effort costs. + +Overall, with this release flow the latency for new updates to components from a +newer distribution is at most two years. This is under the assumption that +users looking for newer features are still in early development and are using +the preview releases of Apertis or of its direct downstreams and at that stage not yet the product +release. Generally this is seen as a reasonable trade-off for most components. diff --git a/content/designs/release-v2019-artifacts.md b/content/designs/release-v2019-artifacts.md new file mode 100644 index 0000000000000000000000000000000000000000..63cac46f6ff0075adf42c0bc403f04298cd45cd0 --- /dev/null +++ b/content/designs/release-v2019-artifacts.md @@ -0,0 +1,146 @@ +--- +title: Release artifacts for Apertis v2019 [draft] +short-description: Draft of the artifacts planned for the Apertis v2019 release +authors: + - name: Emanuele Aina +--- + +# Introduction + +This draft document describes which artifacts are expected to be part of the +Apertis v2019 release and what are their goals. + +The main kinds of artifacts are: + +* ospack: rootfs without the basic hardware-specific components like bootloader, kernel and hardware-specific support libraries +* system image: combines an ospack with hardware-specific components in a snapshot that can be directly deployed on the supported boards + * Apt-based images: images meant for development, with a modifiable rootfs that can be customized with the [Apt](https://wiki.debian.org/Apt) package manager + * OSTree-based images: images with a immutable rootfs and a reliable update mechanism based on [OSTree](https://ostree.readthedocs.io/), more similar to what products would use than the Apt-based images +* OSTree repository: server backend used by the OSTree-based images for efficient distribution of updates +* sysroot: rootfs to be used for cross-compilation for platform and application development targeting a specific image +* devroot: rootfs for targeting foreign platforms using binary emulation for platform and application development +* nfs: kernel, initrd, dtb and rootfs tarball for network booting using TFTP and NFS + +# Supported platforms + +Architectures: + + * `amd64`: the common Intel x86 64bit platform, also known as `x86-64` + * `armhf`: the hard-float variant of the ARMv7 32bit platform + * `arm64`: the ARMv8 64bit platform, also known as `aarch64` + +[Reference systems](https://wiki.apertis.org/Reference_Hardware): + * `amd64`: [MinnowBoard Turbot Dual-Core (E3826)](https://wiki.apertis.org/Reference_Hardware/Minnowboard_setup) + * `amd64`: [VirtualBox](https://wiki.apertis.org/VirtualBox) for the SDK virtual machines + * `armhf`: [i.MX6 Sabrelite](https://wiki.apertis.org/Reference_Hardware/imx6q_sabrelite_setup) + * `arm64`: [Renesas R-Car M3 Starter Kit Pro (M3SK/m3ulcb)](https://wiki.apertis.org/Reference_Hardware/Rcar-gen3_setup) + +# Supported artifacts + +This is an overview of the release artifacts: + +* `minimal` ospacks + * amd64, armhf, arm64 + * OSTree repository + * Apt-based and OSTree-based system images for the reference hardware platforms +* `target` ospacks + * amd64, armhf, arm64 + * OSTree repository + * Apt-based and OSTree-based system images for the reference hardware platforms +* `basesdk` ospacks + * amd64 + * Apt-based system image for VirtualBox +* `sdk` ospacks + * amd64 + * Apt-based system image for VirtualBox +* `sysroot` ospack + * amd64, armhf, arm64 + * tarball matching the target ospack +* `devroot` ospack + * amd64, armhf, arm64 + * tarball +* `nfs` ospack + * amd64, armhf, arm64 + * tarball and unpacked kernel artifacts for TFTP/NFS network booting + +## Minimal + +Minimal images provide compact example images for headless systems and +are a good starting point for product-specific customizations. + +Other than basic platform support in order to succesfully boot on the reference +hardware, the minimal example images ship the complete connectivity stack. + +The reference update system is based on OSTree, but APT-based images are also +provided for development purposes. + +No artifact covered by the GPLv3 is part of the `minimal` ospacks and images. + +## Target + +Target images provide the most complete superset of the features offered by the minimal images, +shipping full graphics support with a sample HMI running on top and +applications with full multimedia capabilities. + +The reference update system is based on OSTree, but APT-based images are also +provided for development purposes. + +No artifact covered by the GPLv3 is part of the `target` ospacks and images. + +The release also includes sysroots which include development tools, headers, +and debug symbols for all the components shipped on target images. These +sysroots contains software covered by the GPLv3 as they are meant for +development purposes only. + +## Base SDK + +The base SDK images are meant to be run as VirtualBox VMs to provide a standardized, +ready-to-use environment for developers targeting Apertis, both for platform +and application development. + +Since the SDK images are meant for development, they also ship tools covered by +the GPLv3 license. + +## SDK + +The full SDK images provide the same features as the base SDK images with +additional tools for application development using the Canterbury application +framework and the Mildenhall HMI. + +## Sysroot + +Sysroots are filesystem trees specifically meant for cross-compilation and +remote debugging targeting a specific release image. + +They are meant to be read-only and target a specific release image, shipping +all the development headers and debug symbols for the libraries in the +release image. + +Sysroots can be used to cross-compile for Apertis from a third-party +environment using an appropriate cross-toolchain. They are most suited for +early development phases where developers focus on quick iterations and rely +on fast incremental builds of their components. + +## Devroot + +Devroots are filesystem trees meant to offer a foreign architecture build +environment via containers and binary emulation via the QEMU user mode. + +They ship a minimal set of packages and offer the ability to install all the +packages in the Apertis archive. + +Due to the nature of foreign architecture emulation they impose a considerable +overhead on build times compared to sysroot, but they avoid all the intricacies +that cross-building involves and offer the ability to reliably build deb packages +targeting foreign architectures. + +For more informations about using devroots see the +["Programming guidelines" section](https://developer.apertis.org/latest/programming-guide-tooling.html#development-containers-using-devrootenter). + +## NFS + +The release includes artifacts for network booting using the TFTP and NFS protocols: +* kernel images for the reference architectures to be loaded via TFTP +* initrd with kernel modules matching the image to be loaded via TFTP +* DTBs (compiled device trees) for the reference hardware platforms to be loaded via TFTP +* rootfs tarball to be loaded via NFS diff --git a/content/designs/robustness.md b/content/designs/robustness.md new file mode 100644 index 0000000000000000000000000000000000000000..e54d68a1e5e1b910c54169b781524542f46bb3a4 --- /dev/null +++ b/content/designs/robustness.md @@ -0,0 +1,821 @@ +--- +title: Robustness +short-description: Dealing with loss of functionality + (partially-implemented) +authors: + - name: Tomeu Vizoso +--- + +# Robustness + +## Introduction + +This design identifies circumstances that, though undesired because of +the risk of loss of functionality, cannot be completely avoided and +provides suggestions for dealing with them in such a way that as little +functionality as possible is lost. + +Note that improving D-Bus' robustness is a topic that will be covered in +a later stage in its own design document. About securing D-Bus services, +please see the security design. + +## Requirements + +Minimize loss of data and loss of functionality due to data corruption +in these abnormal circumstances: + + - Unexpected power loss + + - Unexpected removal of storage devices + + - Unexpected lack of disk space + + - Physical damage to the media and other hardware errors + +Minimize loss of functionality due to processes hogging these shared +resources: + + - CPU + + - GPU + + - I/O + + - memory + + - network queue + + - D-Bus daemon + +## Approach + +This section explains how to address the requirements in several +specific cases, taking into account different data sets and +circumstances. + +### Application data + +This section contains recommendations about how to robustly deal with +data generated by applications. + +#### General guidelines + +No software should assume that opening files will always succeed. +Failure conditions should be dealt with and the process will either +continue running with as little loss of functionality as possible, or +will log a message and exit. Programs should do the same when writing +data (the filesystem may be full, or any other mode of error might +occur). + +For example, if the browser application finds out at start up that the +cookies file is corrupted, it should move the old file away (or just +delete it) and run as usual other than past persistent cookies will have +been lost. Or if there was an error when writing a new persistent cookie +to disk, the browser would keep running with that cookie being transient +(in memory only). + +In order to reduce the effects of data corruption, regardless of the +causes, it would make sense to store different data sets in separate +files. So that if corruption happens in, for example, the browser cookie +store, it would not affect unrelated functionality such as playlists. + +For big data sets, Collabora recommends SQLite with either Write-Ahead +Logging ([WAL]) or [roll-back journal][SQLite-rollback]. For smaller data sets, a +robust method is to write to a temporary file and rename it on top of +the old one once finished. This method is called “atomic +overwrite-by-rename†and is mostly used when editing a file in-place. + +POSIX requires the atomicity of [overwrite-by-rename]. Btrfs, Ext3 +and Ext4 give atomic overwrite-by-rename guarantees, as well as atomic +truncate guarantees. The FAT filesystem guarantees neither. + +#### SQLite + +For applications using SQLite for their storage, Collabora recommends +using either WAL or the rollback journal so that transactions are +committed atomically. In addition, filesystem-specific tuning would be +done by configuring the SQLite system library for optimal performance. + +WAL will be the best option in most cases, except when transactions will +be very big (involving more than 100 MB) and when writes are very +seldom, then the rollback journal would be preferred. + +Collabora will run the [TCL test harness] for SQLite in LAVA, to +detect any issues in the specific configuration and software in the +target platform. These include robustness tests that reproduce +out-of-memory errors, input/output errors and abnormal termination +(crashes or power loss). + +#### Tracker + +Tracker stores data in SQLite files, so the robustness considerations +that apply to SQLite apply to Tracker as well. By default it uses WAL +instead of the traditional rollback journal, which gives better +performance for Tracker's workload with the same robustness guarantees. + +#### User settings + +For configuration settings in general, Collabora recommends using the +[GSettings] API from GLib with the [dconf] backend. When updating +the database, dconf will write the whole new contents to a new file, +then atomically renaming it on top of the old one. + +For bigger pieces of data (individual settings whose data component +exceeds 1 KB), Collabora recommends using plain files via a known-robust +file-handling library (such as [GKeyFile] from Glib, which is already +a dependency) or SQLite. + +#### Media + +For media, the meta-data is stored in Tracker, with the actual data +files in the /home filesystem and in attached removable devices. + +If the Tracker database that contains the meta-data has been corrupted, +it should be moved to the side (or deleted) and recreated again by +indexing all available media files. To minimize the chances of +corruption, refer to [][Tracker]. + +Software that reads the actual media files should assume media files may +contain invalid data and ignore them without further loss of +functionality. Corrupted media files should not be displayed in the UI. + +#### Caches + +All software that uses a cache file should be ready to find that the +cache is unusable and cope with it without loss of functionality +(temporary degradation of performance is obviously expected in this case +though the mechanism by which the cache became corrupted will be treated +by developers as a bug to fix). + +For example, if during start-up the Folks caches are found to be +unreadable, libfolks would remove the corrupted cache files and recreate +them, taking a longer time to reply to queries. As the application using +Folks would be executing the queries asynchronously, the UI would keep +being functional while the query executes. + +Examples of other components that use caches and that should cope with +cache corruption are the browser and the email client. + +#### Filesystems + +The reliability with which data is stored depends on both the storage +medium as well as the filesystem. In this section, we cover FAT32 and +Btrfs. Ext4 is mentioned, as it is a popular default filesystem on many +Linux distributions – however it doesn't suit the needs of the rollback +system – either for system rollbacks (See the System Update and Rollback +Design) or for application rollbacks (See the Applications Design). + +The FAT32 filesystem is not robust under abnormal circumstances since it +was not made for devices which could be disconnected at any moment. In +general, an approach where writes to the device are tightly controlled +and restricted to small time-windows would help minimize the chances of +corruption. See the *Media and Indexing* design for a detailed +explanation of the issues and suggestions. + +The Ext4 filesystem is quite robust under power failure by default. It +can be made even more robust by [mounting it under data=journal][kernel-ext4] +mode, but at a large cost to performance. + +Btrfs has been created on very robust principles, building upon the +experience of Ext4. Some brief technical details are provided at the end +of this document in [][BTRFS overview]. + +##### Filesystem options + +Filesystems usually have parameters that can be tuned to suit specific +workloads. Some of them affect performance as well as robustness; either +by trading off between the two, or by taking advantage of specific +hardware features available with the storage media. + + - **FAT32** is a simple filesystem that does not have many filesystem + options related to performance or robustness. Since we will not be + creating any FAT32 partitions ourselves, only mount-time options are + interesting for us. The recommended options are listed below: + + - sync, flush + These filesystem options ensure that the kernel, as well as the + filesystem, flush data to the partition as soon as possible. + This greatly reduces the chances of data loss or filesystem + corruption when USB drives are yanked out by the user. + + - ro (read only) + It is recommended that FAT32 partitions be mounted read-only to + avoid filesystem corruption, and other related problems as + detailed in the “*Media and Indexing Designâ€* in the section + “*Indexing database on removable deviceâ€*. + + - **Btrfs** is relatively new, and so does not have many options + relevant to our needs of enhancing reliability on eMMC storage + media. The available options are listed below. + + - *Mount-time options:* + + - commit=number (default: 30) + Set the interval of periodic commit. This option is recent + [(since kernel 3.12)][btrfs-mount-options]. + + - ssd + This option enables SSD-specific optimizations and disables + some optimisations specifically for rotating media. This + option is enabled automatically on non-rotating storage. + + - Recovery (default: off) + This option can be used to attempt recovery of a corrupted + filesystem (See [][Repair and recovery]). + + - *Filesystem creation options:* + + - \-s sector-size + This is the size of the filesystem blocks used for + allocations. Ideally, this should be the same size as the + block size for the storage medium. + + - \-M + + This sets BTRFS to use “mixed block groups†- a mode that + stores data and metadata chunks together on disk for more + efficient space utilization for small filesystems – but + incurs a performance penalty on large ones. This option is + not mature and will be evaluated in the future. + +The System Updates and Rollback Design describes the partition layout +for Apertis. Not all the partitions have the same requirements, so both +the FAT32 and BTRFS filesystems are used. The partitions are configured +--- + + - **Factory Recovery** – This partition is never mounted read-write + and must be readable by the boot loader. Currently the boot loader + for Apertis – U-boot – does not support BTRFS. While patches exist + to add that functionality, they have not yet seen widespread + testing. FAT32 will likely be the filesystem chosen for the factory + recovery image. + + - **Minimal Boot partitions** – These partitions must also be readable + by the boot loader, and are currently FAT32. They are not normally + mounted at run-time, instead they are created, mounted, and + populated by the system update software once – and only ever + accessed by the boot loader afterwards. They will be mounted with + the “sync†and “flush†flags. + + - **System** - Since BTRFS provides an excellent snapshot mechanism to + assist system rollbacks (See [][Cheap, fast, and atomic snapshots and rollback]), + this partition will be populated with a + BTRFS filesystem created with the appropriate sector size for the + storage device. It may be created with mixed block groups to save + storage space if that option does not lead to instability. It will + be mounted with the ssd option as well as read-only. During a system + update a single subvolume of the system subvolume will be mounted + read-write. The repair mount option will never be attempted on the + system partition, instead rollbacks or factory recovery will be used + to avoid potentially putting the system into an unknown state. + + - **General Storage** – This partition shares similar requirements to + the system partition. It will be BTRFS, created with an appropriate + sector size and possibly mixed block groups. It will be mounted with + the ssd option. This is the only built-in non-volatile storage that + will always be mounted read-write. In the case of a damaged + filesystem, repair may be attempted on this partition. + +Additionally, there are 2 partitions for raw status flag data that do +not use filesystems at all. See the System Updates and Rollback Design +for more details. + +##### Checksumming + +Checksumming is used for detecting filesystem corruption due to any +reason. Different filesystems have different mechanisms for +checksumming which give us coverage for various different causes of +filesystem corruption. Each mechanism consumes I/O and CPU resources, +and that must be weighed against the advantages that it gives us. + +It is important to note that checksumming does not protect us against +corruption or help us in fixing the root cause of the corruption; it +only allows us to detect filesystem corruption when it happens. Hence, +it is only useful as a warning sign and recovering from data +corruption is beyond the scope of this feature. + + - **FAT32** is a very old and simplistic filesystem, and it has no + inbuilt facilities for checksumming. + + - **Btrfs** maintains a *checksum tree* for all the blocks that it + allocates and writes to. Hence, all file data and metadata is + checksummed. This is the default behaviour and the current checksum + algorithm uses few resources. This method of checksumming can detect + all the ways in which corruption can occur to data on the + filesystem. See [][Checksumming] for more detail. + + - **Ext4** maintains checksums for journal data only, no checksumming + of file data takes place. + +##### Alignment + +The first piece of tuning that a filesystem on flash storage needs, is +a proper mapping of the filesystem blocks to the page size of the +erase blocks on the flash. This consists of two parts: + +1. Ensuring that the filesystem and storage erase block sizes match + using filesystem creation options. + +2. Aligning the block allocations in the filesystem with the storage + blocks by using the appropriate offsets while partitioning, or while + creating the filesystem. + +If either of these is not satisfied, each filesystem block write will +trigger two or more flash block writes, and reduce the performance as +well as reliability of the MMC card. + +The storage erase block size [can be read][get-flash-erase-size] from /proc/mtd or from +U-Boot but the flash storage can report something different than +the real numbers. Some sizes are available on the [Linaro wiki][linaro-sizes]. +Linaro-image-tools is [now able][linaro-alignment] to generate images with a correct +alignment. + +##### Testing + +Collabora will add tests to LAVA for testing how FAT32 and Btrfs behave +on the i.MX6 under stress, as well as for tuning the above mentioned +parameters for reliability and performance. + +#### Root filesystem + +The approach will be to mount as many parts of the root filesystem +read-only as possible such that the only writes to it would be during +updates. This would reduce the chances of catastrophic filesystem +corruption in the event of power failure and invalid system file +modification by bugs in system or application software. The only +partition that is to be mounted writable is the user partition that will +be mounted in /home. All the other writable parts of the / filesystem +will be backed by tmpfs, located in RAM. We will avoid the lack of space +problem by only storing small files in tmpfs or files which don't take +space (lock files, socket files). Bigger files such as programs, +libraries, configuration files will remain on disk and available +read-only. + +See the *System Updates and Rollback* design for detailed information +about the robustness of the update process. + +#### Other filesystems + +The system should be able to function even if mounting one or more of +the non-essential file systems fails. Even if the system is able to keep +running, it would do so with reduced functionality, so some recovery +action would need to be taken in order to regain the lost functionality. +The system should try to recover automatically as far as possible. In +the case of unrecoverable system failure, the user can be instructed at +system boot to request technical assistance at a service shop. + +#### Main storage + +In case of power loss, the flash media can become corrupted due to how +writes are performed. Apertis will be notified via a GPIO signal 100 +milliseconds before power is completely lost, in order to give the flash +controller time to commit to non-volatile media what is in its cache. + +Given the short time available and the general slowness of flash devices +when writing, we recommend that the signal is handled in the kernel, +because userspace will not have enough time to react (depending on the +load and the scheduler, it could take from 10 ms to 100 ms for the +signal to start being processed by a userspace process). A device driver +should be written that, when the GPIO signal is received: + +1. stops flushing dirty pages to the drive, + +2. tells the flash controller to flush its caches to permanent storage, + and + +3. starts the shutdown sequence. + +The device driver will start handling the signal 10-100 µs after the +GPIO is activated. In spite of this, if the device has big caches and is +slow to write, corruption of arbitrary data blocks can still happen. + +In general, drive health data should be monitored so that the user can +be notified about disk failures which require a garage visit for +hardware replacement. + +As no more dirty pages will be flushed to the storage device when the +GPIO signal is received, the data in the page cache will be lost. To +reduce the amount of data that could be lost, eMMC reliable writes can +be used, and the page cache configuration can be tuned. But it has to be +noted that use of reliable writes and reducing the amount of in-flight +data is a trade-off against performance that can be quantified only on +the final hardware configuration through direct experimentation. + +#### Removable devices + +External devices that can be removed at any moment are not reliable for +writing of critical data. In addition to the problem of corruption of +files being written, wear leveling by the controller might corrupt +unrelated blocks which might even contain the directory table or the +file allocation table, rendering the whole partition unusable. + +The quality of external storage devices such as flash drives varies +greatly, in some cases the device will unexpectedly stop responding to +commands, or data will be lost. Applications that write to removable +drives must be robust enough to be able to continue in the face of such +errors with minimal loss of functionality. + +As mentioned in [][Filesystems], the safest way to use +removable drives is by restricting the processes that can write to the +drive, and minimizing the time-window for the writes. For that to be +practical, there should be a system service that is the only one allowed +to write to removable devices and that would accept requests from +applications, remount the device read-write, write the new contents, +then remount read-only again. + +Since, for interoperability reasons, the filesystem used in removable +devices is FAT32, in addition to the issues mentioned in this section, +the robustness considerations that were explained earlier in [][Filesystems] +also apply. + +### Mitigating the effects of lack of disk space + +In order to reduce the chances that the system will find itself in a +situation where lack of disk space is problematic, it is recommended +that available disk space is monitored and applications notified so they +can react and modify their behavior accordingly. Applications may chose +to delete unused files, delete or reduce cache files or purge old data +from their databases. + +The recommended mechanism for monitoring available disk space is for a +daemon running in the user session to call *statvfs* (2) periodically on +each mount point and notify applications with a D-Bus signal. [Example +code][GNOME-housekeeping] can be found in the GNOME project, which uses a similar +approach (polling every 60 seconds). + +Additionally, so error messages can be stored also in low-space +conditions, it is recommended that *journald* is configured to leave an +amount of free space smaller than the reserved blocks of the filesystem +that backs the log files. This way, applications will still be able to +log messages after applications have consumed all the sace available to +them. + +In case applications cannot be trusted to properly delete non-essential +files, a possibility would be for them to state in their manifest where +such files will be stored, so the system can delete them when needed. + +In order to make sure that malfunctioning applications cannot cause +disruption by filling filesystems, it would be required that each +application writes to a separate filesystem. + +It may be worth noting that temporary directories should be emptied on +reboot. + +### Resource management + +The robustness goal of resource management is to prevent one or more +applications from disrupting basic functionality due to excessive +resource consumption. The basic mechanism for this is to allocate +resources in such a way that applications cannot starve services in the +base system. This is to be achieved firstly by changing the resource +allocation policy to give higher priority to services, and secondly by +limiting the maximum amount of resources that an application can consume +at a time. + +Resource limits are capable of helping ensure a process does not render +the whole system unresponsive. However, some design decisions also play +an important role here. If the user has no way to kill the process that +became too slow or unresponsive, the user experience will suffer. The +same goes for the case in which an application gets stuck into a failing +scenario, such as a web browser automatically loading pages that were +open when the browser closed unexpectedly. For these reasons care must +be exercised while designing the user interactions for both the system +chrome and applications to be sure such cases are addressed. + +If, despite throttling, some processes still impact the overall user +experience negatively because of excessive resource usage, there is the +option of identifying those processes and terminating them. Collabora +recommends against this because it is very difficult to automatically +distinguish between processes that use large amounts of resources due to +malfunction or maliciousness and processes that use excessive resources +for legitimate purposes. Killing the wrong process may free up resources +but is likely to be perceived by the user as a severe defect in the +overall user experience. + +As a general recommendation, for optimal responsiveness, applications +should not block the UI thread when calling anything that is not assured +to return almost immediately, which includes all local or remote I/O +operations. When the potential duration of an operation is a +considerable portion of the commonly-considered maximum acceptable +response time (100 ms), it should be done asynchronously. GLib contains +[asynchronous APIs][GNOME-async] for I/O in its [file][GNOME-file-ops] and [streaming][GNOME-streaming] +classes. + +#### CPU + +To make sure that important processes have available CPU cycles even +when malfunctioning or malicious applications monopolise the CPU, it is +recommended to set task scheduler priorities according to the importance +of processes. Systemd can do this for services by setting the +[CPUSchedulingPriority][systemd-exec] property in the service unit file of the +process. When the process described by the service unit file starts new +processes, they stay in the same cgroups and they keep the same +CPUSchedulingPriority. + +At present (Q1 2014), systemd manages the user session on target images +but not on the SDK. With the user session managed by systemd, the +priorities of applications are no longer set by the application launcher +using [sched_setscheduler (2)]. + +If there are processes that need real-time capabilities, or that should +have very low CPU access, the CPUSchedulingPolicy property can be used +to change to the rr (real-time) or idle scheduling policies. Real-time +access for a process should be carefully considered and tested because +it can have a negative impact on the process and even the entire system. + +For identifying processes that use an excessive amount of CPU, the +[cpuacct] cgroups controller can be used. + +Though it is not recommended to automatically terminate local +applications with excessive CPU usage, it makes sense for web pages. Web +pages are not screened before they execute on the system, hence it is +important to ensure that their ability to disrupt system functionality +is minimised. For this, WebKit can detect when a block of JavaScript +code has been executing for too long, pause it, and give the embedding +application the possibility of canceling the execution of this block of +code. Collabora has added API to WebKit-Clutter for this. + +#### I/O + +Similar to CPU usage, Collabora recommends giving priority to important +processes when there is contention for I/O bandwidth. Collabora +recommends that important services have a value for the property +IOSchedulingPriority lower than 4 (the default). If, for any reason, +some applications need priorities other than the default, the +application launcher can use the [ioprio_set] (2) syscall to change +their priority. When the process described by the service unit file +starts new processes, they stay in the same cgroups and they keep the +same IOSchedulingPriority. + +#### Memory + +Collabora recommends putting a single limit on the amount of memory that +the whole application set can allocate so a fair reserve is left for the +base software. This limit should be just big enough so that the Apertis +instance never reaches the “out of memory†([OOM]) condition at the +system level. For example, if the total of memory available for +processes is 1GB, there is no swap, and we know that the services in the +base system should need a maximum of 300MB, then all applications should +belong to a cgroup that is limited to 700MB of memory. + +In specific cases, it may make sense to put a different limit on a +specific application, but it can easily be counterproductive and cause a +waste of memory. + +Something else worth doing is to make sure that the [OOM killer][lwn-oom-killer] +selects applications for killing instead of system services. For this, +the systemd property OOMScoreAdjust can be used to reduce the chances +that a service will be killed. For applications, it is recommended that +the application launcher sets its /proc/\<pid\>/oom\_score\_adj (see [here][kernel-proc]) to +be higher than 0. The ideal value may vary depending upon the importance +of each application. + +With the example setup mentioned before, the OOM killer will terminate +the bulkiest application when one of these conditions are met: + + - The total memory taken by applications all together is going to + increase over 700MB. + + - The total memory taken by all processes (services plus applications) + is going to increase over 1GB. + +To make better use of the available memory, it's recommended that +applications listen to the cgroup notification +[memory.usage_in_bytes] and when it gets close to the limit for +applications, start reducing the size of any caches they hold in main +memory. It may be good to do this inside the SDK and provide +applications with a glib signal that they can listen for. + +#### Network queue + +Processes would be classified into cgroup classes such as: + + - Interactive (VoIP, internet radio) + + - Semi-interactive (web pages, maps) + + - Asynchronous (mail, app notifications, etc) + + - Bulk (downloads, system updates) + +Cgroup controllers are only used for classification of outgoing packets. +[NETPRIO_CGROUP] and [NET_CLS_CGROUP] would be used for +setting the priority, and for classifying processes into cgroups. By +thus tagging packets with the cgroup of applications and services, +*[tc]* can be used to set limits to the rate at which processes send +packets (<http://lartc.org/howto/>). + +Bandwidth rate-limiting would be required to ensure interactive streams +do not get starved by lower priority streams. + +There is little we can do about latency for applications like VoIP, +since even when the bandwidth is sufficient, the bottlenecks are the +hardware buffers, queues, and scheduling on various devices outside the +control of our system. This is an open problem in networking, and a +large part of it is related to [Bufferbloat]. + +Note that there's no robustness issue that can be prevented by limiting +the rate at which processes receive incoming packets. + +#### GPU + +As explained in the WebGL design, the [GL_EXT_robustness] +extension provides a mechanism by which the watchdog in the GL +implementation can reset the GPU, invalidating all GL contexts and thus +stopping all GPU activity. + +Unfortunately, this only prevents denial of service (DoS) conditions +caused by WebGL, because processes must opt-in to use this extension. +Thus, applications may intentionally or unintentionally ignore the +extension and continue monopolising the GPU. Within the web browser, +scripts that use WebGL and take over the GPU will be interrupted and +terminated by the browser. + +If it runs its own GL implementation, then it could monitor GPU resource +usage and reset those contexts that seem to be disrupting the rest of +the system. It could notify processes via the GL\_EXT\_robustness +extension and even terminate them if they ignore the context reset +notifications. + +#### Accounting + +Besides setting limits on resources, cgroups also allows to retrieve +resource usage metrics. As examples, for CPU usage the *cpuacct* cgroup +controller contains the *usage*, *stat* and *usage\_percpu* reports; the +*memory* controller provides usage data in its *stat* report; the +*blkio* controller has *throttle.io\_serviced* and +*throttle.io\_service\_bytes*. + +## USB undervoltage + +In the case that the system momentarily isn't able to power connected +USB devices such as MP3 players or smartphones due to voltage drops, the +system will power off and on again these devices, so that the connection +gets reestablished and the user experience gets affected as little as +possible. + +## Risks + + - FAT32 is fundamentally unreliable, specially on removable devices. + + - Robustness of flash media varies greatly and the user may not be + able to distinguish failures caused by the hardware from failures + due to the software. + + - Excessively-low resource limits for applications can lead to + resource waste; excessively-high may be less effective in avoiding + DoS. There may not exist a good middle point. + + - Heuristics used to determine when to kill a process with excessive + resource usage are not perfect and can cause major failure from the + user point of view. + + - If Vivante does not implement GL\_EXT\_robustness properly, web + pages could DoS the whole system. + + - Bugs in the OpenGL implementation can lead to instability, data loss + and privacy breaches that can be triggered from web pages. + + - If the flash media loses power while a block is open for writing, it + is possible that several random blocks elsewhere in the same drive + will be corrupted. This can affect other filesystems, even if they + are mounted read-only. + +## Design notes + +The following items have been identified for future investigation and +design work later in the project and are thus not addressed in this +design: + + - Vulnerability to DoS attacks in D-Bus and proposed solutions. + + - Optimization of the SQLite configuration parameters for the specific + filesystems in use in Apertis. + +No updates as of March 2014. + +## BTRFS Overview + +The most powerful feature of [Btrfs] is the fact that all +information (data + metadata) is stored in the same basic data +structures, and all modification of these data structures is performed +in a copy-on-write (CoW) fashion. + +Since all information on disk is stored using the same type of data +structure, this allows metadata and data to share features such as +checksumming and striping. + +This combined with the fact that Btrfs uses CoW while modifying all +information, means that in theory, the filesystem is always consistent +if the storage device supports “[Force Unit Access]†correctly. +However, in practice, filesystem bugs, a lack of maturity in the code, +and other (unforeseen) problems may prevent this. + +### BTRFS robustness supporting features + +#### Cheap, fast, and atomic snapshots and rollback + +All snapshots in Btrfs are CoW copies of the subvolume being snapshotted +with an incremented reference count for the blocks. As a result, +creating snapshots is very fast, and they take up a negligible amount of +space. Just like every other operation, the snapshot is created +atomically by the use of transactions and sequenced flushes. Further, +all snapshots are actually just subvolumes, and hence can be mounted on +their own. + +Unlike LVM2, which creates snapshots in the form of block devices that +can be mounted, Btrfs creates snapshots in the form of subvolumes, which +are represented as subdirectories. + +Even though snapshots are displayed in a subdirectory they are not +"owned" by that subvolume. Snapshots and subvolumes are identical in +Btrfs, and are first-class citizens with respect to other subvolumes. +This means that the default subvolume can be set at any time. The change +will be made the next time the filesystem is mounted. All the +subvolumes, except the top-level subvolume, can also be deleted; +irrespective of their relationships with each other. + +#### Repair and recovery + +If, for any reason, the root node or the superblock gets corrupted and +the filesystem cannot be mounted, mounting in recovery mode will make +btrfs check the superblock (or alternate superblocks if the superblock +is also corrupted) for alternate roots from previous transactions. This +is possible because all modifications to the Btrfs trees are done in a +CoW manner and existing roots are not deleted. The filesystem stores the +last four roots as a backup for the recovery option. + +#### Checksumming + +The header of every chunk of space in Btrfs has space for 32-bytes of +checksums of the chunk itself. In addition, there is a checksum tree +which maintains checksums for each block of data. Since the data as well +as the metadata blocks are referenced in the checksum tree, all +information in the filesystem is checksummed. + +Currently, Btrfs uses the CRC-32 checksum algorithm, but there are plans +to upgrade that, and add the option to set the checksum algorithm when +the filesystem is + created. + +[WAL]: http://www.sqlite.org/draft/wal.html + +[SQLite-rollback]: http://www.sqlite.org/draft/lockingv3.html#rollback + +[overwrite-by-rename]: http://pubs.opengroup.org/onlinepubs/009695399/functions/rename.html + +[TCL test harness]: http://www.sqlite.org/testing.html#tcl + +[GSettings]: http://developer.gnome.org/gio/stable/GSettings.html + +[dconf]: https://live.gnome.org/dconf + +[GKeyFile]: https://developer.gnome.org/glib/2.37/glib-Key-value-file-parser.html + +[kernel-ext4]: http://kernel.org/doc/Documentation/filesystems/ext4.txt + +[btrfs-mount-options]: https://btrfs.wiki.kernel.org/index.php/Mount_options + +[get-flash-erase-size]: http://processors.wiki.ti.com/index.php/Get_the_Flash_Erase_Block_Size + +[linaro-sizes]: https://wiki.linaro.org/WorkingGroups/KernelArchived/Projects/FlashCardSurvey + +[linaro-alignment]: https://bugs.launchpad.net/linaro-image-tools/bug/626907 + +[GNOME-housekeeping]: http://git.gnome.org/browse/gnome-settings-daemon/tree/plugins/housekeeping/gsd-disk-space.c#n693 + +[GNOME-async]: http://developer.gnome.org/gio/stable/async.html + +[GNOME-file-ops]: http://developer.gnome.org/gio/stable/file_ops.html + +[GNOME-streaming]: http://developer.gnome.org/gio/stable/streaming.html + +[systemd-exec]: http://0pointer.de/public/systemd-man/systemd.exec.html + +[sched_setscheduler (2)]: http://www.kernel.org/doc/man-pages/online/pages/man2/sched_setscheduler.2.html + +[cpuacct]: http://www.kernel.org/doc/Documentation/cgroups/cpuacct.txt + +[ioprio_set]: http://www.kernel.org/doc/man-pages/online/pages/man2/ioprio_set.2.html + +[OOM]: http://en.wikipedia.org/wiki/Out_of_memory + +[lwn-oom-killer]: http://lwn.net/Articles/317814/ + +[kernel-proc]: http://www.kernel.org/doc/Documentation/filesystems/proc.txt + +[memory.usage_in_bytes]: http://www.kernel.org/doc/Documentation/cgroups/memory.txt + +[NETPRIO_CGROUP]: http://lwn.net/Articles/474695/ + +[NET_CLS_CGROUP]: http://docs.fedoraproject.org/en-US/Fedora/16/html/Resource_Management_Guide/sec-net_cls.html + +[tc]: http://lartc.org/manpages/tc.txt + +[Bufferbloat]: http://en.wikipedia.org/wiki/Bufferbloat + +[GL_EXT_robustness]: http://www.khronos.org/registry/gles/extensions/EXT/EXT_robustness.txt + +[Btrfs]: https://btrfs.wiki.kernel.org/ + +[Force Unit Access]: http://en.wikipedia.org/wiki/SCSI_Write_Commands#Write_.2810.29 diff --git a/content/designs/secure-boot.md b/content/designs/secure-boot.md new file mode 100644 index 0000000000000000000000000000000000000000..375c9ab239f2dedb0a47a0201a1e0942d0720129 --- /dev/null +++ b/content/designs/secure-boot.md @@ -0,0 +1,655 @@ +--- +title: Apertis secure boot +authors: + - name: Sjoerd Simons + - name: Denis Pynkin +--- + +# Apertis secure boot + +## Introduction + +For both privacy and security reasons it is important for modern devices to +ensure that the software running on the device hasn't been tampered with. In +particular any tampering with software early in the boot sequence will be hard +to detect later while having a big amount of control over the system. To solve +this issues various vendors and consortiums have created technologies to combat +this, known under names as "secure boot", "highly assured boot" (NXP), +"verified boot" (Google Android/ChromeOS). + +While the scope and implementation details of these technologies differs the +approach to provide a trusted boot chain tends to be similar between all of +them. This document discusses how that aspect of the various technologies works +on a high-level and how this can be introduced into Apertis. + + +## Boot sequence + +To understand how secure boot works first one has to understand how booting +works. From a high-level perspective a CPU is a very simple beast, it needs to +be pointed at a stream of instructions (code) which it will then be able to +execute. Without instructions a CPU cannot do anything. The instructions also +need to be in a region of memory which the CPU can access. However when a +device is powered on the code that is meant to be run on it (e.g. Linux) will +not be in memory yet. To make matters worse on power on main memory (Dynamic +RAM) will not even be accessible by the CPU yet! To solve this problem some +bootstrapping is required, typically referred to as booting the system. + +The very first step in the boot process after power on is to get the CPU to +start executing some instructions. As the CPU cannot load instructions without +running instructions these first instructions are hardwired into the SoC +directly with the CPU is hardwired to start executing those when powers comes +on. This hardwired piece of code is often referred to as the ROM or romcode. + +The job of the romcode is to do very basic SoC setup and load further code to +execute. To allow the romcode to do its job, it will have access to a small +amount of static RAM (SRAM, typically 64 to 128 kilobyte). The locations from where +the ROM code can load is system specific. On most modern ARM-based systems this +will include at least (SPI-connected) flash (NAND/NOR), eMMC cards, SD cards, +serial ports etc. Most systems can only have code loaded over USB initially +while some can even load code directly over the network via bootp!. The +details of the format the code needs to be in (e.g. specific headers), how +the code is presented (e.g. specific offsets on the eMMC) is very system +specific. Once romcode managed to load the code from one of its supported +location into SRAM execution of that code will start, which will the first time +user supplied code is actually ran on the device. + +This next step is known under various different names such as Boot Loader stage +1(BL1), Secondary Program Loader(SPL), Tertiary Program Loader(TPL), etc. The +code for this stage must be quite small as only SRAM is available at this +stage. The goal for this step is normally to initialize Dynamic RAM (e.g. run +DDR memory training) followed by loading the next step into DRAM and executing +it (which can be far bigger now that DRAM is available). Depending on the +system this stage may also provide initial user feedback that the system is +booting (e.g. display a first splash image, turning an LED on etc), but that +purely depends on the overall system design and available space. + +What the next step of executed code is more system specific. In some cases it +can directly be Linux, in some cases it will be a bootloader with more +functionality (as all of main memory is now available) and in some cases it +will be multiple loader steps. As an example of the last case for devices using +ARM Trusted Firmware there will typically be follow-on steps to load the secure +firmware (such as [OP-TEE](op-tee.md)) followed by a non-secure world +bootloader which loads Linux. For those interested the various images used in +an ATF setup can be found +[here](https://trustedfirmware-a.readthedocs.io/en/latest/getting_started/image-terminology.html). + +Linux starting up typically is the last phase of the boot process. For Linux to +start the previous stage will have loaded a kernel image, optionally a +initramfs and optionally a devicetree into main memory. The combination of +these will load the root filesystem at which point userspace (e.g. +applications) will start running. + + +Note that while the above is a simple view on the basic boot process, the +overall flow will be the same on all systems (both ARM and non-ARM devices). +For the above we also implicitly assumed that only one CPU is booted, for some +more complex systems multiple CPUs (e.g. main application processors and various +co-processors) might be booted. It may even be the case that all the early +stages are done by a co-processor which takes care of loading the first code +and starting the main processor. The overall description is also valid for +system with hypervisors, essentially the hypervisor is just another stage in +the boot sequence and will load/start the code for each of the cells it runs. + +For this document we'll only look at securing the booting of the main (Linux +running) processor without a hypervisor. + +## Secure boot sequence + +The main objective for a secure boot process is to ensure all code that gets +executed by the processor is trusted. As each of the stages described in the +previous section is responsible for loading the code for the next stage the +solution for that is relatively straight-forward. Apart from loading the next +stage of code, each stage also needs to verify the code it has loaded. +Typically this is done by some signature verification mechanism. + +The ROM step is normally assumed to be fully trusted as it's hard-wired +into the SoC and cannot be replaced. How the ROM is configured and how it +validates the next stage is highly device specific. Later steps can do the +verification either by calling back into ROM code (thus re-using the same +mechanisms as the ROM) or by pure software implementation (making it more +consistent between different devices). + +In all cases to support this, apart from device specific configuration, all +boot stages need to be appropriately signed. Luckily this is typically based on +standard mechanisms such as RSA keys and X.509 Certificates. + +Once Linux starts the approach has to be different as it's not feasible in +most systems to fully verify all of the root filesystem at boot time as this +simply would take far too long. As such the form of protection described +thus far only gets applied up to the point the Linux kernel starts loading. + +## Threat models + +To understand what a secure boot system really secures it's important to look +at the related threat models. As a first step we can distinguish between +offline (device is turned off) and online attacks (device powered on). + +For these considerations the assumption is made all boot steps work as +intended. As with any software security vulnerabilities can invalidate the +protection given. While in most cases these can be patches as issues become +known, for ROM code this is impossible without a hardware change. + +### offline attacks + +* Attack: Replace any of the boot stages on device storage (physical access + required) +* Impact: Depending on the boot stage the attacker can get full control + of the device for each following boot. +* Mitigation: Assuming each stage correctly validates the next boot + stage, any tampering with loaded code will be detected and + prevented (e.g. device fails to boot). + +* Attack: Trigger the device to load software from external means (e.g. USB or + serial) under the attackers control. +* Impact: Depending on the boot stage the attacker can get full control + of the device. +* Mitigation: The ROM or any stage that loads from an external source should + use the same verification as for any on device stages. However for + production use, if possible, loading software from external source + should be disabled. + +* Attack: Replace or add binaries on the systems root filesystem +* Impact: Full control of the device as far as the kernel allows. +* Mitigation: No protection from the above mechanisms. + +### online attacks + +* Attack: Gain enough access to replace any of the boot stages on device storage +* Impact: Depending on the boot stage the attacker can get full control + of the device for each following boot. +* Mitigation: Assuming each stage correctly validates the next boot + stage, any tampering with loaded code will be detected and + prevented (e.g. device fails to boot). + +* Attack: Replace or add binaries on the systems root filesystem +* Impact: Full control of the device as far as the kernel allows. +* Mitigation: No protection from the above mechanisms. + + +## Signing and signing infrastructure + +To securely boot a device it is assumed all the various boot stages have some kind +of signature which can be validate by previous stages. Which by extension also +means the protection is only as strong as the signature; if an attacker can +sign code under their control with a signature that is valid (or seen as +valid) for the verifying step all protection is lost. This means that special +care has to be taken with respect to key handling to ensure signing keys are +kept with the right amount of security depending on their intended use. + +For development usage and devices a low amount of security is ok in most cases, +the intention in the development stage is for developers to be easily able to +run their own code and by extension should be able to sign their own builds +with minimal effort. + +For production devices however the requirements should be much more strict as +unauthorized of control of a signing key can allow attackers to defeat the +intended protection by secure boot. Furthermore production devices should +typically not be allowed to run development builds as those tend to enable +extra access for debugging and development reasons which tend to be a great +attack vector. + +For these reason it's recommendable to have at least two different sets of +signing keys, one for development usage and one for production use. Development +keys can be kept with low security or even be publicly available, while +production keys should *only* be used to sign final production images and +managed by a hardware security module (HSM) for secure storage. To allow the +usage of a commercially available HSMs it's recommended for the signing process +to be able to support the +[PKCS#11 standard](https://en.wikipedia.org/wiki/PKCS_11). + +Note that in case security keys do get lost/stolen/etc it is possible for some +devices to revoke or update the valid set of keys. However this can be quite +limited e.g. on i.MX6 device one can *one-time* program up to four acceptable +keys and each of those can be flagged as revoked, but it's impossible to add +more or replace any keys. + +## Apertis secure boot integration + +Integrating secure boot into Apertis really exists out of two parts. The first +part is to ensure all boot stages have the ability to verify. The second part +is to be able to sign all the boot stages as part of the Apertis image building +process. While the actual implementation details of both will be +system/hardware/SoC specific the impact of this is generic for all. + +As Apertis images are composed out of pre-build binary packages the package +delivering the implementation for the various boot stages should either provide +a build which will always enforce signature verification *or* the +implementation should detect if the device is configured for secure boot and +only enforce in that situation. Enforcing on demand has the benefit that it +makes it easier to test the same builds on non-secure devices (though care must +be taken that secure boot status cannot be faked). + +For the signing of the various stages this needs to be done at image build time +such that the signing key can be chosen based on the target. For example +whether it's a final production build or a development build or even a +production build to test on development devices. This in turn means that the +signing tools and implementation need to support signing outside the build +process which is normally supported. + +## Apertis secure boot implementation steps + +As the whole process is somewhat device specific implementation of a secure +boot flow for Apertis should be done on a device per device basis. The best +starting point is is most likely the NXP i.MX6 sabrelite reference board as the +secure boot process (Highly Assured Boot in NXP terms) is both well-known and +well supported by upstream components. Furthermore an initial PoC for the early +boot stages was already done for the NXP Sabre Auto boards which are based on +the same SoC. + +## SabreLite secure boot preparation + +The [good introduction into HAB (High Assurance Boot)](https://boundarydevices.com/high-assurance-boot-hab-dummies/) +is prepared by Boundary Devices, also there are some [documentation](https://github.com/u-boot/u-boot/blob/master/doc/imx/habv4/introduction_habv4.txt) +and examples in U-Boot source tree. + +The [NXP Code Signing Tool](https://gitlab.apertis.org/pkg/development/imx-code-signing-tool) +is needed to create keys, certificates and SRK hashes used during the signing +process -- please refer to [section 3.1.3 of CST User's Guide](https://gitlab.apertis.org/pkg/development/imx-code-signing-tool/-/blob/apertis/v2021dev2/docs/CST_UG.pdf). +Apertis reference images use the [public git repository](https://gitlab.apertis.org/infrastructure/apertis-imx-srk) +with all secrets available, so it could be used for signing binaries during +development in case if board has been fused with Apertis SRK hash (**irreversible operation!!!**). + +**_Caution_**: the SabreLite board can be fused with the SRK (Super Root Key) +hash only once! + +To fuse the [Apertis SRK hash](https://gitlab.apertis.org/infrastructure/apertis-imx-srk/-/blob/master/SRK_1_2_3_4_fuse.bin) +we have to have the hexadecimal dump of the hash of the key. Command below will produce the +output with commands for Apertis SRK hash fusing: +--- +$ hexdump -e '/4 "0x"' -e '/4 "%X""\n"' SRK_1_2_3_4_fuse.bin | for i in `seq 0 7`; do read h; echo fuse prog -y 3 $i $h; done +--- + +This command generates the list of commands to be executed in a U-Boot CLI. +For Apertis SRK hash fusing they are: +--- +fuse prog -y 3 0 0xFD415383 +fuse prog -y 3 1 0x519690F5 +fuse prog -y 3 2 0xE844EB48 +fuse prog -y 3 3 0x179B1826 +fuse prog -y 3 4 0xEC0F8D7C +fuse prog -y 3 5 0x2F209598 +fuse prog -y 3 6 0x9A98BE3 +fuse prog -y 3 7 0xAAD9B3D6 +--- + +After execution of commands above only [Apertis development keys](https://gitlab.apertis.org/infrastructure/apertis-imx-srk/) +can be used for signing the U-Boot binary. + +The i.MX6 ROM does signature verification of the bootloader during +startup, and depending on the configured (fused) mode the behaviour is +different. The i.MX6 device may work in 2 modes: +- "open" -- the HAB ROM allows the use of unsigned bootloaders or bootloaders + signed with any key, without checking its validity. + In case of errors, it will only generate HAB secure events on boot without + halting the process. This mode is useful for development. +- "closed" -- only signed with correct key U-Boot may be started, any + incorrectly signed bootloader will not be started. + This mode should be used only for final product. + +**It is highly recommended not to use "closed" mode for development boards!** + +To check if your device is booted with correctly signed bootloader, and +SRK key is fused, just type this in the U-Boot CLI: +--- +=> hab_status + +Secure boot enabled + +HAB Configuration: 0xcc, HAB State: 0x99 +No HAB Events Found! +--- +The output shows if the device is in "closed" mode (secure boot enabled) and +booted without any security errors. + +In case of errors in "open" mode the same command will show the +list of HAB events similar to: +--- +--------- HAB Event 5 ----------------- +event data: + 0xdb 0x00 0x14 0x41 0x33 0x21 0xc0 0x00 + 0xbe 0x00 0x0c 0x00 0x03 0x17 0x00 0x00 + 0x00 0x00 0x00 0x50 + +STS = HAB_FAILURE (0x33) +RSN = HAB_INV_CERTIFICATE (0x21) +CTX = HAB_CTX_COMMAND (0xC0) +ENG = HAB_ENG_ANY (0x00) +--- + +During Linux kernel verification it is possible to emulate the "closed" mode +with `fuse override` command and proceed with the boot: +--- +=> fuse override 0 6 0x2 +=> run bootcmd +--- +_Note_: the only issue with closed mode emulation -- the device will +accept kernel signed with any key, but HAB events will be +generated and shown in that case. + +To close a device you need to fuse the same values used for overriding. + +**_Caution_**: the board can only use bootloaders signed with the Apertis development key after the +step below! This is irreversible operation: +--- +=> fuse prog 0 6 0x2 +--- + +## Secure boot in the U-Boot package for Sabrelite + +The U-Boot bootloader must be configured with the option `CONFIG_SECURE_BOOT` +to enable support of HAB (High Assurance Boot) support on i.MX6 platform. + +Upstream U-Boot has no protection based on the HAB engine to prevent executing +unsigned binaries. Verified boot with the usage of HAB ROM is enabled in +U-Boot for Apertis only for [FIT (Flattened uImage Tree)](https://github.com/u-boot/u-boot/blob/master/doc/uImage.FIT/source_file_format.txt) +format since it allows to embed Linux kernel, initramfs and DTB into a single image. +Hence the support of FIT images must be enabled in U-Boot configuration by +option `CONFIG_FIT`. + +The [patch series](https://gitlab.apertis.org/pkg/target/u-boot/-/merge_requests/4) +enables verification of FIT image prior to execution of the Linux kernel. Patched U-Boot +do verification of the whole FIT binary prior to extraction kernel and initramfs images, +and this ensures that only verified initial system will be started. + +All other format types like zImage, as well as other boot methods are +prohibited on fully secured device when "closed" mode is enabled or emulated. + +## Sign U-Boot bootloader such that the ROM can verify + +To sign the U-Boot for SabreLite we need `cst` tool installed in the system +and the [Apertis development keys repository](https://gitlab.apertis.org/infrastructure/apertis-imx-srk) need to be checked out. Please use the [csf/csf_uboot.txt](https://gitlab.apertis.org/infrastructure/apertis-imx-srk/-/blob/master/csf/csf_uboot.txt) file +as a template for your U-Boot binary. + +U-Boot for SabreLite board doesn't use SPL, hence the whole `u-boot.imx` binary +must be signed. With enabled `CONFIG_SECURE_BOOT` option the build log +will contain following output (and file `u-boot.imx.log` as well): +--- +Image Type: Freescale IMX Boot Image +Image Ver: 2 (i.MX53/6/7 compatible) +Mode: DCD +Data Size: 606208 Bytes = 592.00 KiB = 0.58 MiB +Load Address: 177ff420 +Entry Point: 17800000 +HAB Blocks: 0x177ff400 0x00000000 0x00091c00 +DCD Blocks: 0x00910000 0x0000002c 0x00000310 +--- +we need values from the string started with "HAB Blocks:". +Those values must be used in "[Authenticate Data]" section of +[template](https://gitlab.apertis.org/infrastructure/apertis-imx-srk/-/blob/master/csf/csf_uboot.txt): +--- +[Authenticate Data] + Verification index = 2 + Blocks = 0x177ff400 0x00000000 0x00091C00 "u-boot.imx" +--- + +To sign the U-Boot with `cst` tool simply call: +--- +cst -i csf_uboot.txt -o csf_uboot.bin +--- +File `csf_uboot.bin` will contain signatures which should be +appended to original `u-boot.imx` binary: +--- +cat u-boot.imx csf_uboot.bin > u-boot.imx.signed +--- + +### Sign U-Boot bootloader for loading via USB serial downloader + +In case if something goes wrong and the system does not boot anymore +it is still possible to boot with the help of [USB serial downloaders](https://community.nxp.com/docs/DOC-95604), +such as `imx_usb_loader` or `uuu`. + +However the U-Boot must be signed in a slightly different way since +some changes are done by ROM in runtime while loading binary. Please +refer to section "What about imx_usb_loader?" of [High Assurance Boot (HAB) for dummies](https://boundarydevices.com/high-assurance-boot-hab-dummies/) +document. + +The template [csf_uboot.txt](https://gitlab.apertis.org/infrastructure/apertis-imx-srk/-/blob/master/csf/csf_uboot.txt) +for signing U-Boot to be loaded over serial downloader protocol should contain +additional block in "[Authenticate Data]" section: +--- +[Authenticate Data] + Verification index = 2 + Blocks = 0x177ff400 0x00000000 0x00091C00 "u-boot.imx", \ + 0x00910000 0x0000002c 0x00000310 "u-boot.imx" +--- + +With the help of [mod_4_mfgtool.sh](https://storage.googleapis.com/boundarydevices.com/mod_4_mfgtool.sh) +script we need to store and restore DCD address from original `u-boot.imx` +in addition to signing: + +--- +sh mod_4_mfgtool.sh clear_dcd_addr u-boot.imx +cst -i csf_uboot.txt -o csf_uboot.bin +sh mod_4_mfgtool.sh set_dcd_addr u-boot.imx +cat u-boot.imx csf_uboot.bin > u-boot.imx.signed_usb +--- + +## Sign kernel images for U-Boot to load + +After the successful startup of U-Boot we need to load the Linux kernel, +initramfs and DTB file into the memory. All these bits must be verified before +transferring control to the kernel. With [FIT (Flattened uImage Tree)](https://github.com/u-boot/u-boot/blob/master/doc/uImage.FIT/source_file_format.txt) +format we can use single signed image with kernel, initramfs and DTB +embedded, and this allows to avoid "mix and match" attacks with +mixed versions of kernel, initramfs, DTB and configuration. + +The signing procedure for kernel images is split into 2 parts: +- preparation of the kernel image in FIT format +- sign FIT image + +### FIT image creation + +[U-Boot documentation](https://github.com/u-boot/u-boot/tree/master/doc/uImage.FIT) +contains a lot of details and examples how to create FIT images +for different purposes. + +To embed all bits into the single FIT image we need to prepare file in +image tree source format, for Apertis we use simple +[template](https://gitlab.apertis.org/infrastructure/apertis-image-recipes/-/blob/apertis/v2021dev1/sign/imx6/fit_image.template) +containing configuration with 3 entries for kernel, initramfs and DTB respectively. +So values `{{kernel}}`, `{{ramdisk}}` and `{{dtb}}` should be substituted with +absolute or relative path to corresponding files. + +Please pay attention to addresses in `load` fields, since the whole +FIT image is loaded into the memory by address `0x12000000` (check the +value of `kernel_addr_r` in U-Boot environment), it is +important to avoid intersections with embedded binaries since they will +be copied to configured memory regions after successful verification. + +To create the FIT image you need to have `mkimage` command from the +package `u-boot-tools` compiled with FIT support. With FIT source file +prepared just run `mkimage` and generate the FIT binary: +--- +$ mkimage -f vmlinuz.its vmlinuz.itb +FIT description: Apertis armhf kernel with dtb and initramfs +Created: Fri Mar 13 02:23:33 2020 + Image 0 (kernel-0) + Description: Linux Kernel + Created: Fri Mar 13 02:23:33 2020 + Type: Kernel Image + Compression: uncompressed + Data Size: 4526592 Bytes = 4420.50 KiB = 4.32 MiB + Architecture: ARM + OS: Linux + Load Address: 0x10800000 + Entry Point: 0x10800000 + Hash algo: sha1 + Hash value: 8a64994bdab06d01450560ea229c9f44f1f0af14 + Image 1 (ramdisk-0) + Description: ramdisk + Created: Fri Mar 13 02:23:33 2020 + Type: RAMDisk Image + Compression: uncompressed + Data Size: 20285185 Bytes = 19809.75 KiB = 19.35 MiB + Architecture: ARM + OS: Linux + Load Address: 0x15000000 + Entry Point: unavailable + Hash algo: sha1 + Hash value: c12652573d1b301b191cf3e2a318913afc1ae4b7 + Image 2 (fdt-0) + Description: Flattened Device Tree blob + Created: Fri Mar 13 02:23:33 2020 + Type: Flat Device Tree + Compression: uncompressed + Data Size: 42366 Bytes = 41.37 KiB = 0.04 MiB + Architecture: ARM + Hash algo: sha1 + Hash value: ace0dd1dea00568b1c4e6df3fb0420c912e3e091 + Default Configuration: 'conf-0' + Configuration 0 (conf-0) + Description: Boot Apertis + Kernel: kernel-0 + Init Ramdisk: ramdisk-0 + FDT: fdt-0 + Hash algo: sha1 + Hash value: unavailable +CSF Processed successfully and signed data available in vmlinuz.itb +--- + +### Signing the FIT image + +Now it is time to sign the produced image. The procedure is similar to +signing U-Boot with additional step -- we need to add the **IVT** +(Image Vector Table) for the kernel image. We skip this step for U-Boot +since it is prepared automatically during the build of the bootloader. + +The IVT is needed for the HAB ROM and must be the part of the binary, it should +be aligned to `0x1000` boundary. +For instance, if the produced binary is: +--- +$ stat -c "%s" vmlinuz.itb +25555173 +--- +we need to pad the file to nearest aligned value, which is `25559040`: +--- +$ objcopy -I binary -O binary --pad-to=25559040 --gap-fill=0x00 vmlinuz.itb vmlinuz-pad.itb +--- + +The next step is IVT generation for the FIT image and the easiest method is to +use the [`genIVT` script](https://storage.googleapis.com/boundarydevices.com/genIVT) +provided by Boundary Devices with adaptation for padded FIT image: +- Jump Location -- 0x12000000 + Here we expect the image will be loaded by U-Boot +- Self Pointer -- 0x13860000 (Jump Location + size of padded image) + Pointer to the IVT table itself, which will place after padded image +- CSF Pointer -- 0x13860020 (Jump Location + size of padded image + size of IVT) + Pointer to signature data, which we will add after IVT + +So, the IVT generation is pretty simple: +--- +$ perl genIVT +--- +it will generate the binary named `ivt.bin` to be added to the image: +--- +$ cat vmlinuz-pad.itb ivt.bin > vmlinuz-pad-ivt.itb +--- + +We need to prepare the config file for signing the padded FIT image with IVT. +This step is absolutely the same as for [U-Boot signing](#sign-uboot-bootloader-such-that-the-rom-can-verify). + +Configuration file for FIT image is created from +template [csf_uboot.txt](https://gitlab.apertis.org/infrastructure/apertis-imx-srk/-/blob/master/csf/csf_uboot.txt), +and values in `[Authenticate Data]` section must be the same as used +for IVT calculation -- Jump Location and the size of generated file: +--- +[Authenticate Data] + Verification index = 2 + # Authenticate Start Address, Offset, Length and file + Blocks = 0x12000000 0x00000000 0x1860020 "vmlinuz-pad-ivt.itb" +--- + +At last we are able to sign the prepared FIT image: +--- +$ cst -i vmlinuz-pad-ivt.csf -o vmlinuz-pad-ivt.bin +CSF Processed successfully and signed data available in vmlinuz-pad-ivt.bin +--- + +## Signing bootloader and kernel from the image build pipeline + +Starting with v2021dev1 Apertis uses single signed FIT kernel image for OSTree-based +systems. The signed version of U-Boot is a part of U-Boot installer. + +For signing binaries with the `cst` tool we need some files from the +[Apertis development keys](https://gitlab.apertis.org/infrastructure/apertis-imx-srk) +git repository. The minimal working setup should include only 6 files: +- `SRK_1_2_3_4_table.bin` -- Super Root Keys table +- `key_pass.txt` -- file with password +- CSF certificate and key in PEM format +- IMG certificate and key in PEM format + +In addition we need a template for the FIT source file and CSF template suitable for +signing U-Boot and FIT kernel. + +All files listed above are added into the git repository inside [sign/imx6](https://gitlab.apertis.org/infrastructure/apertis-image-recipes/-/tree/apertis/v2021dev1/sign/imx6) +subdirectory. Since all secrets for Apertis are public we are able to use them directly +from the repo. However this is not acceptable for production. + +Fortunately the most of CI tools have possibility to add files as secrets +available only on several steps. Hence we add "private" keys and password +file as "Secret file" global credentials to demonstrate the integration +into the Jenkins pipeline: + + + +For keys usage they should be available during the call of `cst` tool, +so we have to add into the Jenkins pipeline copying of these secret files +with the same names as used in [CSF template](https://gitlab.apertis.org/infrastructure/apertis-image-recipes/-/blob/apertis/v2021dev1/sign/imx6/fit_image_csf.template) +and remove them after the usage. + +For instance the simple secrets copying for Jenkins: +--- +withCredentials([ file(credentialsId: csf_csf_key, variable: 'CSF_CSFKEY'), + file(credentialsId: csf_img_key, variable: 'CSF_IMGKEY'), + file(credentialsId: csf_key_pass, variable: 'CSF_PASSWD')]) { + // Setup keys for cst tool from Jenkins secrets + // Have to keep keys and password file near certificates + sh(script: """ + cd ${WORKSPACE}/sign/imx6 + cp -af $CSF_CSFKEY ./ + cp -af $CSF_IMGKEY ./ + cp -af $CSF_PASSWD ./""") +} +--- + +### U-Boot signing + +To sign the U-Boot the script [scripts/sign-u-boot.sh](https://gitlab.apertis.org/infrastructure/apertis-image-recipes/-/blob/apertis/v2021dev1/scripts/sign-u-boot.sh) +has been added. It automatically generates the CSF configuration +from the template [sign/imx6/fit_image_csf.template](https://gitlab.apertis.org/infrastructure/apertis-image-recipes/-/blob/apertis/v2021dev1/sign/imx6/fit_image_csf.template) +and call the `cst` tool to sign the U-Boot binary. + +The script is called by the [Debos recipe for the SabreLite U-Boot installer +image](https://gitlab.apertis.org/infrastructure/apertis-image-recipes/-/blob/apertis/v2021dev1/mx6qsabrelite-uboot-installer.yaml): +--- + - action: run + description: Sign U-Boot + script: scripts/sign-u-boot.sh "${ROOTDIR}/deb-binaries/usr/lib/u-boot/{{ $target }}/u-boot.imx" +--- + +### FIT image creation and signing + +The FIT image is more complex. So for Apertis we use 2 scripts: +- the [`scripts/generate_signed_fit_image.py` script](https://gitlab.apertis.org/infrastructure/apertis-image-recipes/-/blob/apertis/v2021dev1/scripts/generate_signed_fit_image.py) + is used for generation FIT image, padding, IVT calculation and signing. + This script can be used standalone to automate all steps described + in the section "[Sign kernel images for U-Boot to load](#sign-kernel-images-for-uboot-to-load)" +- the [`scripts/generate_fit_image.sh` script](https://gitlab.apertis.org/infrastructure/apertis-image-recipes/-/blob/apertis/v2021dev1/scripts/generate_fit_image.sh) + is a wrapper for the former providing it the paths + for kernel, initramfs and DTB to include them in the signed FIT image. + +The integration with the build pipeline happens **after** the kernel is installed +by the [OSTree commit recipe](https://gitlab.apertis.org/infrastructure/apertis-image-recipes/-/blob/apertis/v2021dev1/apertis-ostree-commit.yaml) by adding the step below: +--- + - action: run + description: Generate FIT image + script: scripts/generate_fit_image.sh +--- + +**NB**: this action must be done prior to ostree commit action +to add the signed FIT kernel into OSTree repository for OTA upgrades. + +## As next steps the following could be undertaken: +* Integration of PCKS#11 support in the signing process to support HSM devices +* Automated testing of secure boot if possible diff --git a/content/designs/security.md b/content/designs/security.md new file mode 100644 index 0000000000000000000000000000000000000000..bc280dd3a6af7d87f9b9297fc3c1d434f817283c --- /dev/null +++ b/content/designs/security.md @@ -0,0 +1,2162 @@ +--- +title: Security +short-description: Discussing and detailing solutions for the security requirements of the system + (general-design) +authors: + - name: Felipe Zimmerle + - name: Mateu Batle +--- + +# Security + +## Overview + +This document discusses and details solutions for the security +requirements of the Apertis system. + +[][Security boundaries and threat model] describes the various +aspects of the security model, and the threat model for each. + +Local attacks to obtain private data or damage the system, including +those performed by malicious applications that get installed in the +device somehow or through exploiting a vulnerable application are +covered in [][Mandatory access control] (MAC). It is also the +main line of defense against malicious email attachments and web +content, and for minimizing the damage that root is able to do are also +mainly covered by the MAC infrastructure. This is the main security +infrastructure of the system, and the depth of the discussion is +proportional to its importance. + +Denial of Service attacks through abuse of system resources such as CPU +and memory are covered by [][Resource usage control]. Attacks +coming in through the device's network connections and possible +strategies for firewall setup are covered in [][Network filtering] + +Attacks to the driver assistance system coming from the infotainment +system are handled by many of these security components, so it is +discussed in a separate section: [][Protecting the driver assistance system from attacks]. +Internet threats are the main subject of 10, +[][Protecting the system from internet threats]. + +[][Secure software distribution] discusses how to provide ways +to make installing and upgrade software secure, by guaranteeing packages +are unchanged, undamaged and coming from a trusted repository. + +Secure boot for protecting the system against attacks done by having +physical access to the device is discussed in [][Secure boot]. +[][Data encryption and removal], is concerned with features +whose main focus is to protect the privacy of the user. + +[][Stack protection], discusses simple but effective techniques +that can be used to harden applications and prevent exploitation of +vulnerabilities. [][Confining applications in containers], +discusses the pros and cons of using the lightweight Linux Containers +infrastructure for a system like Apertis. + +[][The IMA Linux integrity subsystem], wraps up this document by +discussing how the Integrity Measurement Architecture works and what +features it brings to the table, and at what cost. + +## Terminology + +### Privilege + +A component that is able to access data that other components cannot is +said to be ***privileged***. If two components have different privileges +– that is, at least one of them can do something that the other cannot – +then there is said to be a ***privilege boundary*** between them. + +### Trust + +A ***trusted*** component is a component that is technically able to +violate the security model (i.e. it is relied on to enforce a privilege +boundary), such that errors or malicious actions in that component could +undermine the security model. The ***trusted computing base (TCB)*** is +the set of trusted components. This is independent of its quality of +implementation – it is a property of whether the component is relied on +in practice, and not a property of whether the component is +***trustworthy***, i.e. safe to rely on. For a system to be secure, it +is necessary that all of its trusted components be trustworthy. + +One subtlety of [Apertis' app-centric design](applications.md) is that there is a +privilege boundary between *application bundles* even within the context +of one user. As a result, a multi-user design has two main layers in its +security model: system-level security that protects users from each +other, and user-level security that protects a user's apps from each +other. Where we need to distinguish between those layers, we will refer +to the ***TCB for security between users*** or the ***TCB for security +between app bundles*** +respectively. + +### Integrity, confidentiality and availability + +Many documents discussing security policies divide the desired security +properties into integrity, confidentiality and availability. The +definitions used here are taken from the USA National Information +Assurance Glossary. + +> Committee on National Security Systems, CNSS Instruction No. 4009 +> National Information Assurance (IA) Glossary, April 2010. +> <http://www.ncsc.gov/publications/policy/docs/CNSSI_4009.pdf> + +***Integrity*** is the property that data has not been changed, +destroyed, or lost in an unauthorized or accidental manner. For example, +if a malicious application altered the user's contact list, that would +be an integrity failure. + +***Confidentiality*** is the property that information is not disclosed +to system entities (users, processes, devices) unless they have been +authorized to access the information. For example, if a malicious +application sent the user's contact list to the Internet, that would be +a confidentiality failure. + +***Availability*** is the property of being accessible and usable upon +demand by an authorized entity. For example, if an application used so +much CPU time, memory or disk space that the system became unusable (a +denial of service attack), or if a security mechanism incorrectly denied +access to an authorized entity, that would be an availability failure. + +## Security boundaries and threat model + +This section discusses the security properties that we aim to provide. + +### Security between applications + +The Apertis platform provides for installation of *application bundles*, +which may come from the platform developer or third parties. These are +described in the Applications design document. + +Our model is that there is a trust boundary between these application +bundles, providing confidentiality, integrity and availability. In other +words, an application bundle should not normally be able to read data +stored by another application bundle, alter or delete data stored by the +other application bundle, or interfere with the operation of the other +application bundle. As a necessary prerequisite for those properties, +processes from an application bundle must not be able to gain the +effective privileges of processes or programs from another application +bundle (privilege escalation). + +In addition to the application bundles, the Apertis *platform* (defined +in the Applications design document, and including libraries, system +services, and any user-level services that are independent of +application bundles) has higher privilege than any particular +application bundle. Similarly, an application bundle should not in +general be able to read, alter or delete non-application data stored by +the platform, except for where the application bundle has been granted +permission to do so, such as a navigation application reading location +data (a “least-privilege†approach); and the application bundle must not +be able to gain the effective privileges of processes or programs from +the platform. + +The threat model here is to assume that a user installs a malicious +application, or an application that has a security flaw leading to an +attacker being able to gain control over it. The attacker is presumed to +be able to execute arbitrary code in the context of the application. + +Our requirement is that the damage that can be done by such applications +is limited to: reading files that are non-sensitive (such as read-only +OS resources) or are specifically shared between applications; editing +or deleting files that are specifically shared between applications; +reducing system performance, but to a sufficiently limited extent that +the user is able to recover by terminating or uninstalling the malicious +or flawed application; or taking actions that the application requires +for its normal operation. + +Some files, particularly large media files such as music, might be +specifically shared between applications; such files do not have any +integrity, confidentiality or availability guarantees against a +malicious or subverted application. This is a trade-off for usability, +similar to Android's Environment.getExternalStorageDirectory(). + +To apply this security model to new platform services, it is necessary +for those platform services to have a coherent security model, which can +be obtained by classifying any data stored by those platform services +using questions similar to these: + + - Can it be read by all applications, applications with a specific + privilege flag, specific applications (for example the application + that created it), or by some combination of those? + + - Can it be written by all applications, applications with a specific + privilege flag, specific applications, or some combination of those? + +It is also necessary to consider whether data stored by different users +using the same application must be separated (see [][Security between users]). + +For example, a platform service for downloads might have the policy that +each application's download history can be read by the matching +application, or by applications with a “Manage Downloads†privilege +(which might for instance be granted to a platform Settings +application). + +As another example, a platform service for app-bundle installation might +have a policy stating that the trusted “Application Installer†HMI is the only +component permitted to install or remove app-bundles. Depending on the +desired trade-off between privacy and flexibility, the policy might be +that any application may read the list of installed app-bundles, that +only trusted platform services may read the list of installed +app-bundles, or that any application may obtain a subset of the list +(bundles that are considered non-sensitive) but only trusted platform +services may read the full list. + +A service can be considered to be secure if it implements its security +policy as designed, and that security policy is appropriate to the +platform's requirements. + +### Communication between applications + +In a system that supports capabilities such as data handover between +applications, it is likely that pairs of application bundles can +communicate with each other, either mediated by platform services or +directly. The [Interface Discovery] and [Data Sharing] designs on +the Apertis wiki have more information on this topic. + +The mechanisms for communicating between application bundles, or between +application bundle and the platform, are to be classified into *public* +and *non-public* interfaces. Application bundles may enumerate all of +the providers of *public* interfaces and may communicate with those +providers, but it is not acceptable for application bundles to enumerate +or communicate with the providers of *non-public* interfaces. The +platform is considered to be trusted, and may communicate with any +*public* or *non-public* interface. + +The security policy described here is one of many possible policies that +can be implemented via the same mechanisms, and could be replaced or +extended with a finer-grained security policy at a later date, for +example one where applications can be granted the capability to +communicate with some but not all non-public interfaces. + +### Security between users + +The Apertis platform is potentially a multi-user environment; see the +Multiuser design document for full details. This results in a two-level +hierarchy: users are protected from each other, and within the context +of a user, apps are protected from other apps. + +In at least some of the possible multi-user models described in the +Multiuser design document, there is a trust boundary between users, +again providing confidentiality, integrity and availability (see above). +Once again, privilege escalation must be avoided. + +As with security between applications, some files (perhaps the same +files that are shared between applications) might be specifically shared +between users. Such files do not have any integrity, confidentiality or +availability guarantees against a malicious user. Android's +Environment.getExternalStorageDirectory() is one example of a storage +area shared by both applications and users. + +### Security between platform services + +Within the platform, not all services and components require the same +access to platform data. + +Some platform components, notably the Linux kernel, are sufficiently +highly-privileged that it does not make sense to attempt to restrict +them, because carrying out their normal functionality requires +sufficiently broad access that they can violate one of the layers of the +security model. As noted in [][Terminology], these components are +said to be part of the *trusted computing base* for that layer; the +number and size of these components should be minimized, to reduce the +exposure of the system as a whole. + +The remaining platform components have considerations similar to those +applied to applications: they should have “least privilegeâ€. Because +platform components are part of the operating system image, they can be +assumed not to be malicious; however, it is desirable to have “defence +in depth†against design or implementation flaws that might allow an +attacker to gain control of them. As such, the threat model for these +components is that we assume an attacker gains control over the +component (arbitrary code execution), and the desired property is that +the integrity, confidentiality and availability impact is minimized, +given the constraint that the component's privileges must be sufficient +for it to carry out its normal operation. + +Note that the concept of the trusted computing base applies to each of +the two layers of the security policy. A system service that +communicates with all users might be part of the TCB for isolation +between users, but not part of the TCB for isolation between platform +components or between applications. Conversely, a per-user service such +as dconf might be part of the TCB for isolation between applications, +but not part of the TCB for isolation between users. The Linux kernel is +one example of a component that is part of the TCB for both layers. + +### Security between the device and the network + +Apertis devices may be connected to the Internet, and should protect +confidentiality and integrity of data stored on the Apertis device. The +threat model here is that an attacker controls the network between the +Apertis device and any Internet service of interest, and may eavesdrop +on network traffic (passive attack) and/or substitute spoofed network +traffic (active attack); we assume that the attacker does not initially +control platform or application code running on the Apertis device. Our +requirement is that normal operation of the Apertis device does not +result in the attacker gaining the ability to read or change data on +that device. + +### Physical security + +An attack that could be considered is one where the attacker gains +physical access to the Apertis system, for example by stealing the car +in which it is installed. It is obviously impossible to guarantee +availability in this particular threat model (the attacker could steal +or destroy the Apertis system), but it is possible to provide +confidentiality, via encryption “at restâ€. + +A variation on this attack is to assume that the attacker has physical +access to the system and then returns it to the user, perhaps +repeatedly. This raises the question of whether integrity is provided +(whether the user can be sure that they are not subsequently entering +confidential data into an operating system that has been modified by the +attacker). + +This type of physical security can come with a significant performance +and complexity overhead; as a trade-off, it could be declared to be +out-of-scope. + +## Solutions adopted by popular platforms + +As background for the discussions of this document, the following +sections provide an overview of the approaches other mobile platforms +have chosen for security, including an explanation of the trade-offs or +assumptions where necessary. + +### Android + +Android uses the Linux kernel, and as such relies on it being secure +when it comes to the most basic security features of modern operating +systems, such as process isolation and an access permissions model. On +top of that, Android has a Java-based virtual machine environment which +runs regular applications and provides them with APIs that have been +designed specifically for Android. Regular applications can execute +arbitrary native code within their application sandbox, for example by +using the NDK interfaces. + +> <https://developer.android.com/training/articles/security-tips.html#Dalvik> +> notes that “On Android, the Dalvik VM is not a security boundaryâ€. + +However, some system functionality is not +directly available within the application sandbox, but can be accessed +by communicating with more-privileged components, typically using +Android's Java APIs. + +Early versions of Android worked under the assumption that the system +will be used by a single user, and no attempt was made towards +supporting any kind of multi-user use case. Based on this assumption, +Android re-purposed the concept of UNIX user ID (uid), making each +application run as a different user ID. This allows for very tight +control over what files each application is able to access by simply +using user-based permissions; this provides isolation between +applications ([][Security between applications]). In later Android versions, which do have +multi-user support, user IDs are used to provide two separate security +boundaries – isolating applications from each other, and isolating users +from each other ([][Security between users]) – with one user ID per (user, app) pair. +This is discussed in more detail in the [Multiuser design document](multiuser.md). + +The system's main file system is mounted read-only to protect against +unauthorized tampering with system files (integrity for platform data, +[][Security between platform services]); however, this does not protect integrity against an +attacker with physical access ([][Physical security]). Encryption of the user data +partition through the standard *dm-crypt* kernel facility +(confidentiality despite physical access, [][Physical security]) is supported if the user +configures a password for their device. Users using gesture-based or +other unlock mechanisms are unable to use this feature. + +The root user on Android is all-powerful, and can do anything to the +system. Android makes no attempt to limit the power of processes running +as UID 0 (the root user ID); in other words, they are part of the TCB. +All security of system services, and the core system and applications +rely on the separation of users already discussed and in assuming +nothing other than the essential (the kernel itself and a very small +number of system services) runs with root privileges. + +Older versions of Android did not use Mandatory Access Control, +discussed in this document's chapter 5. More recent versions use SELinux +to augment the uid-based sandbox. + +> Security-Enhanced Linux in Android, +> https://source.android.com/devices/tech/security/selinux/ + +The idea of restricting the services an application can use to those specified in the application's +manifest also exists in Android. Before installation, Android shows a +list of system services the application intends to access and +installation only initiates if the user agrees. This differs slightly +from the [Applications design in Apertis](applications.md), in which some permissions +are subject to prompting similar to Android's, while other permissions +are checked by the app store curator and unconditionally granted on +installation. + +Android provides APIs to verify a process has a given permission, but no +central control is built into the API layer or the IPC mechanism as +planned for Apertis – checking whether a caller has the required +permissions to make that call is left to the service or application that +provides the IPC interface or API, similar to how most GNOME +services work by using [PolicyKit]\(see section 6 for more on this +topic). + +> See, for instance, how the A2DP service verifies the caller has the +> required permission: +> <https://github.com/android/platform_frameworks_base/blob/master/core/java/android/server/BluetoothA2dpService.java#L257> + +No effort is made specifically towards thwarting applications +misbehaving and causing a Denial of Service on system services or the +IPC mechanism. Android uses two very simple strategies to forcibly stop +an application: 1) it kills applications when the device is out of +memory; 2) it notifies the user of [unresponsive applications][Android-responsiveness] and allows +them to force the application to close, similar to how GNOME does +--- + +An application is deemed to not be responding after about 5 seconds of +not being able to handle user input. This feature is implemented by the +Android window manager service, which is responsible for dispatching +events read from the kernel input events interface (the files under +**/dev/input**) to the application, in cooperation with the activity +manager service, which shows the application not responding dialog and +kills the application if the user decides to close it. After dispatching +an event, the window manager service waits for an acknowledgement from +the application with a timeout; if the timeout is hit, then the +application is considered not responding. + +### Bada + +Bada is not an Open Source platform, so closer inspection of the inner +workings is not feasible. However, the documentation indicates that Bada +also kills applications when under memory pressure. + +It also uses a simple *API privilege level* framework as the base of its +security and reliability architecture. Applications running with the +*Normal* API privilege level need to specify which *[API privilege +groups][bada-privileged-api]* it needs to be able to access in their manifest file. + +Some APIs are restricted under the *System* API level and can be used +only by Samsung or its authorized partners. It's not possible to say +whether those restrictions are applied in a general way or by having the +modules that provide the APIs perform validation checks, but the latter +seems more likely given these are C++ APIs that do not go through any +kind of central service. + +### iOS + +iOS is, like Bada, a closed platform, so [details are sometimes difficult +to obtain][iOS-sec], but Apple does use some Open Source components (at the lower +levels, in particular). iOS has an [application sandbox][iOS-sandbox] that is +very similar in functionality to AppArmor, discussed bellow. The +technology is based on Mandatory Access Control provided by the +[TrustedBSD] project and has been marketed under the *Seatbelt* +name. + +Like AppArmor, it uses configuration files that specify profiles, using +path-based rules for file system access control. Also like AppArmor, +other functionality such as network access can be controlled. The actual +confinement is applied when the application uses system calls to request +that the kernel carries out an action on the application's behalf (in +other words, when the privilege boundary between user-space and the +kernel is crossed). + +Seatbelt is considered to be the single canonical solution to sandboxing +applications on iOS; this is in contrast with Linux, in which AppArmor +is one option among many (system calls can be mediated by seccomp, the +[Secure Computing API][lwn-secure-computing] described in section 17 of this document, in +addition to up to one MAC layer such as AppArmor, SELinux or Smack). + +None of this complexity is exposed to apps developed for iOS, though; +they are merely implementation details. + +Apparently, there are no central controls whatsoever protecting the +system from applications that hang or try to DoS system services. The +only real limitation imposed is the available system memory. + +Applications are free to use any APIs available, there are no explicit +declarative permissions system like the one used in Android. However, +some functionality are always mediated by the system, including through +system-controlled UI. + +For instance, an application can query the GPS for location; when that +happens, the system will take over and present the user with a request +for permission. If the user accepts the request will be successful and +the application will be white-listed for future queries. The same goes +for interacting with the camera: the application can request a picture +be taken, but the UI that is presented for taking the picture is +controlled by the system as is actual interaction with the camera. + +This is analogous to the way in which Linux services can use PolicyKit +to mediate privileged actions (see section 6), although on iOS the +authorization step is specifically considered to be an implementation +detail of the API used, whereas some Linux services do make the calling +application aware of whether there was an interactive authorization +step. + +## Mandatory Access Control + +The goal of the Linux Discretionary Access Control (DAC) is a separation +of multiple users and their data ([][Security between users], +[][Security between platform services]). The policies are +based on the identity of a subject or their groups. Since in Apertis +applications from the same user should not trust each other ([][Security between applications]), +the utilization of a Mandatory Access Control (MAC) system is +recommended. MAC is implemented in Linux by one of the available Linux +Security Modules (LSM). + +### Linux Security Modules (LSM) + +Due to the different nature and objectives of various security models +there is no real consensus about which security model is the best, thus +support for loading different security models and solutions became +available in Linux in 2001. This mechanism is called Linux Security +Modules (LSM). + +Although it is in theory possible to provide generic support for any +LSM, in practice most distributions pick one and stick to it, since both +policies and threat models are very specific to any particular LSM +module. + +The first implementation on top of LSM was SELinux developed by the US +National Security Agency (NSA). In 2009 the TOMOYO Linux module was also +included in the kernel followed by AppArmor in the same year. The +sub-sections below gives a short introduction on the security models +that are officially supported by the Linux Kernel. + +#### SELinux + +[SELinux] is one of the most well-known LSMs. It is supported by +default in Red Hat Enterprise Linux and Fedora. It is infamous for how +difficult it is to maintain the security policies; however, being the +most flexible and not having any limitation regarding what it can label, +it is the reference in terms of features. For every user or process, +SELinux assigns a context which consists of a role, user name and +domain/type. The circumstances under which the user is allowed to enter +into a certain domain must be configured into the policies. + +SELinux works by applying rules defined by a policy when kernel-mediated +actions are taken. Any file-like object in the system, including files, +directories, and network sockets can be labeled. Those labels are set on +file system objects using extended file system attributes. That can be +problematic if the file system that is being used in a given product or +situation lacks support for extended attributes. While support has been +built for storing labels in frequently used networking file systems like +NFS, usage in newer file systems may be challenging. Note that BTRFS +does support extended attributes. + +Users and processes also have labels assigned to them. Labels can be of +a more general kind like, for instance, the sysadm\_t label, which is +used to determine that a given resource should be accessible to system +administrators, or of a more specific kind. + +Locking down a specific application, for instance, may involve creating +new labels specifically for its own usage. A label “browser\_cache\_t†+may be created, for instance, to protect the browser cache storage. Only +applications and users which have their label assigned to them will be +able to access and manage those files. The policy will specify that any +files created by the browser on that specific directory are assigned +that label automatically. + +Labels are automatically applied to any resources created by a process, +based on the labels the process itself has, including sockets, files, +devices represented as files and so on. SELinux, as other MAC systems, +is not designed to impose performance-related limitations, such as +specifying how much CPU time a process may consume, or how many times a +process duplicates itself, but supports pretty much everything in the +area it was designed to target. + +The SELinux support built into D-Bus allows enhancement of the existing +D-Bus security rules by associating names, methods and signals with +SELinux labels, thus bringing similar policy-making capabilities to +D-Bus. + +#### TOMOYO Linux + +[TOMOYO Linux] focuses on the behavior of a system where every +process is created with a certain purpose and allows each process to +declare behaviors and resources needed to achieve their purposes. TOMOYO +Linux is not officially supported by any popular Linux distribution. + +#### SMACK + +Simplicity is the primary design goal of [SMACK]. It was used by +MeeGo before that project was cancelled; [Tizen] appears to be the +only general-purpose Linux distribution using SMACK as of 2015. + +SMACK works by assigning labels to the same system objects and to +processes as SELinux does; similar capabilities were proposed by Intel for +D-Bus integration, but their originators did not follow up on +[reviews][SMACK-reviews], and the changes were not merged. SMACK also relies on +extended file system attributes for the labels, which means it suffers +from the same shortcomings that come from that as SELinux. + +There are a few special predefined labels, but the administrator can +create and assign as many different labels as desired. The rules +regarding what a process with a given label is able to perform on an +object with another given label are specified in the system-wide policy +file /etc/smack/accesses, or can be set in run-time using the smackfs +virtual file system. + +MeeGo used SMACK by assigning a separate label to each service in the +system, such as “Cellular†and “Locationâ€. Every application would get +their own labels and on installation the packaging system would read a +manifest that listed the systems the application would require, and +SMACK rules would then be created to allow those accesses. + +#### AppArmor + +Of all LSM modules that were reviewed, Application Armor +([AppArmor]) can be seen as the most focused on application +containment. + +AppArmor allows the system administrator to associate an executable with +a given profile in order to limit access to resources. These resource +limitations can be applied to network and file system access and other +system objects. Unlike SMACK and SELinux, AppArmor does not use extended +file system attributes for storing labels, making it file system +agnostic. + +Also in contrast with SELinux and SMACK, AppArmor does not have a +system-wide policy, but application profiles, associated with the +application binaries. This makes it possible to disable enforcement for +a single application, for instance. In the event of shipping a policy +with an error that leads to users not being able to use an application +it is possible to quickly restore functionality for that application +without disabling the security for the system as a whole, while the +incorrect profile is fixed. + +Since AppArmor uses the path of the binary for profile selection, +changing the path through manipulation of the file system name space +(i.e. through links or mount points) is a potential way of +working-around the limits that are put in place; while this is cited as +a weakness, in practice it is not an issue, since restrictions exist to +block anyone trying to do this. Creation of symbolic links is only +allowed if the process doing so is allowed to access the original file, +and links are followed to enforce any policy assigned to the binary they +link to. Confined processes are also not allowed to mount file systems +unless they are given explicit permission. + +Here's an example of how restricting ping's ability to create raw +sockets cannot be worked around through linking – lines beginning with $ +represent commands executed by a normal user, and those starting with \# +have been executed by the root user: + +```shell +$ ping debian.org +ping: icmp open socket: Operation not permitted +$ ln -s /bin/ping +$ ./ping debian.org +ping: icmp open socket: Operation not permitted +$ ln /bin/ping ping2 +ln: failed to create hard link `ping2' => `/bin/ping': Operation not permitted +# ping debian.org +ping: icmp open socket: Operation not permitted +# ln -s /bin/ping /bin/ping2 +# ping2 debian.org +ping: icmp open socket: Operation not permitted +# +--- + +> AppArmor restriction applying to file system links + +Copying the file would make it not trigger the containment. However, +even if the user was able to symlink the binary or use mount points to +work-around the path-based restrictions that should not mean privilege +escalation, given the white-list approach that is being adopted. That +approach means that any binary escaping its containment profile would in +actuality be dropping privileges, not escalating them, since the +restrictions imposed on binaries that do not have their own profile can +be quite extensive. + +Note that Collabora is proposing mounting partitions that should +only contain data with the option that disallows execution of code +contained in them, so even if the user manages to escape the strict +containment of the user session and copied a binary to one of the +directories they have write access to, they would not be able to run it. +Refer to the System updates & rollback and Application designs for more +details on file system and partition configuration. + +Integration with D-Bus was developed by Canonical and shipped in Ubuntu +for several years, before being merged upstream in dbus-daemon 1.9 and +AppArmor 2.9. The implementation includes patches to AppArmor's +user-space tools, to make the new D-Bus rules known to the profile +parser, and to dbus-daemon, so that it will check with AppArmor before +allowing a request. + +AppArmor will be used by shipping profiles for all components of the +platform, and by requiring that third-party applications ship with +their own profiles that specify exactly what requests the application +should be allowed. + +Creating a new profile for AppArmor is a reasonably simple process: a +new profile is generated automatically running the program under +AppArmor's profile generator, [aa-genprof], and exercising its +features so that the profile generator can capture all of the accesses +the application is expected to make. After the initial profile has been +generated it must be reviewed and fine-tuned by manual editing to make +sure the permissions that are granted are not beyond what is expected. + +In AppArmor there is no default profile applied to all processes, but a +process always inherits limitations imposed to its parent. Setting up a +proper profile for components such as the session manager is a practical +and effective way of implementing this requirement. + +### Comparison + +Since all those Linux Security Modules rely on the same kernel API and +have the same overall goals, the features and resources they are able to +protect are very similar, thus not much time will be spent +covering those. The policy format and how control over the system and +its components is exerted varies from framework to framework, though, +which leads to different limitations. The table below has a summary of +features, simplicity and limitations: + +| | SELinux | AppArmor | SMACK | +| -------------------- | ------------ | --------------- | ----- | +| Maintainability | Complex | Simple | Simple | +| Profile creation | Manual/Tools | Manual/Tools | Manual | +| D-Bus integration | Yes | Yes | Not proposed upstream | +| File system agnostic | No | Yes | No | +| Enforcement scope | System-wide | Per application | System-wide | + +> Comparison of LSM features + +Historically LSM modules have focused on kernel-mediated accesses, such +as access to file system objects and network resources. Modern systems, +though, have several important features being managed by user-space +daemons. D-Bus is one such daemon and is specially important since it is +the IPC mechanism used by those daemons and applications for +communication. There is clear benefit in allowing D-Bus to cooperate +with the LSM to restrict what applications can talk to which services +and how. + +In that regard SELinux and AppArmor are in advantage since D-Bus is able +to let these frameworks decide whether a given communication should be +allowed or not, and whether a given process is allowed to acquire a +particular name on the bus. Support for SMACK mediation was worked on by +Intel for use in Tizen, but has not been proposed for upstream inclusion +in D-Bus, and is believed to add considerable complexity to dbus-daemon. +There is no work in progress to add TOMOYO support. + +Like D-Bus' built-in support for applying “policy†to message delivery, +AppArmor mediation of D-Bus messages has separate checks for whether the +sender may send a message to the recipient, and whether the recipient +may receive a message from the sender. Either or both of these can be +used, and the message will only succeed if both sending and receiving +were allowed. The sender's AppArmor profile determines whether it can +send (usually conditional on the profile name of the recipient), and the +recipient's AppArmor profile determines whether it can receive (either +conditional on the profile name of the sender, or unconditionally), so +some coordination between profiles is needed to express a particular +high-level security policy. + +The main difference between the SELinux and SMACK label-based mediation +in terms of features is how granular you can get. With the [D-Bus +additions to the AppArmor profile language][apparmor-dbus-additions], for instance, in +addition to specifying which services can be called upon by the +constrained process it is also possible to specify which interfaces and +paths are allowed or denied. This is unlike [SELinux mediation], which +only checks whether a given client can talk to a given service. +One caveat regarding fine-grained (interface- and path-based) D-Bus +access control is that it is often not directly useful, since the +interface and path is not necessarily sufficient to determine whether an +action should be allowed or denied (for example, [][Motivation for polkit] describes +why this is the case for the udisks service). As a result of +considerations like this, the developers of kdbus oppose the +addition of fine-grained access control within kdbus, and have indicated +that kdbus' access-control will never go beyond allowing or rejecting a +client communicating with a service. + +> kdbus is a kernel module that has been proposed to take over the +> role of the user-space dbus-daemon in D-Bus on Linux systems. +> <https://github.com/gregkh/kdbus> + +Software that is being used by large distributions is often more tested +and tested in more diverse scenarios. For this reason Collabora believes +that being used by one of the main distributions is a very important +feature to look for in a LSM. + +Flexibility is also good to have, since more complex requirements can be +modeled more precisely. However, there is a trade-off between complexity +and flexibility that should be taken into consideration. + +The recommendation on the selection of the framework is a combination of +the adoption of the framework by existing distributions, features, +maintainability, cost of deployment and experience of the developers +involved. The table below contains a comparison of the adoption of the +existing security models. Only major distributions that ship and enable +the module by default are listed. + +| Name | Distributions | Merged to mainline | Maintainer | +| -------- | -------------------------- | ------------------ | ---------- | +| SELinux | Fedora, Red Hat Enterprise | 08 Aug 2003 | NSA, Network Associates, Secure Computing Corp., Trusted Computer Solutions, Tresys | +| AppArmor | SUSE, OpenSUSE, Ubuntu | 20 Oct 2010 | SUSE, Canonical | +| SMACK | Tizen | 11 Aug 2007 | Intel, Samsung2 | +| TOMOYO | | 10 Jun 2009 | NTT Data Corp. | + +> Comparison of LSM adoption and maturity + +### Performance impact + +The performance impact of MAC solutions depends heavily on the workload +of the application, so it's hard to rely upon a single metric. It seems +major adopters of these technologies are not too concerned about their +real-world impact, even though they may be expressive in benchmarks, +since there are no recent measurements of performance impact for the +major MAC solutions. + +That said, early tests indicate that SELinux has a performance impact +[floating around 7% to 10%][SELinux-perf-impact], with tasks that are more CPU intensive +having *less* impact, since they are not making many system calls that +are checked. SELinux performs checks on every operation that touches a +labeled resource, so when reading or writing a file all read/write +operations would cause a check. That means making larger operations +instead of several smaller ones would also make the overhead go down. + +AppArmor generally does fewer checks than SELinux since only +operations that open, map or execute a file are checked: the individual +read/write operations that follow are not checked independently. +Novell's documentation and FAQs state a 0.2% overhead is expected on +best-case scenarios – writing a big file, for instance, with a 2% +overhead in worst-case scenarios (an application touching lots of files +once). Collabora's own testing on a 2012 x86-64 system puts the worst +case scenario leaning towards the 5% range. The test measured reading +3000 small files with a hot disk cache, and ranged from ~89ms to ~94ms +average duration. + +SMACK's performance characteristics should be similar to that of +SELinux, given their similar approach to the problem. SMACK has been +tested for a [TV embedded scenario][smack-embedded-tv] which has shown performance +degradation from 0% all the way to 30% on a worst-case scenario of +deleting a 0-length file. Degradation varied greatly depending on the +benchmark used. + +The only conclusion Collabora believes can be drawn from these numbers +is that an approach which checks less often (as is the case for +AppArmor) can be expected to have less impact on performance, in +general. That said, these numbers should be taken with a grain of salt, +since they haven't been measured in the exact same hardware and with the +exact same methodology. They may also suffer from bias caused by +benchmark tests which may not represent real-world usage scenarios. + +No numbers exist measuring the impact on performance of the existing +D-Bus SELinux and AppArmor mediation, nor with the in-development SMACK +mediation. The overhead caused to each D-Bus call should be similar to +that of opening a file, since the same procedure is involved: a check +needs to be done each time a message is received from a client that is +contained. It should be noted that D-Bus is not designed to be used for +high-frequency communication due to its per-message overhead, so the +additional overhead for AppArmor should not be problematic unless D-Bus +is already being misused. + +Where higher-frequency communication is required, D-Bus' file descriptor +passing feature can be used to negotiate a private channel (a pipe or +socket) between two processes. This negotiation can be as simple as a +single D-Bus method call, and only incurs the cost of AppArmor checks +once (when it is first set up). Subsequent messages through the private +channel bypass D-Bus and are not checked individually by AppArmor, +avoiding any per-message overhead in this case. + +A more realistic and reliable assessment of the overhead imposed on a +real-world system would only be feasible on the target hardware, with +actual applications, where variables like storage device and file system +would also be better controlled. + +### Conclusion + +Collabora recommends the adoption of a MAC solution, specifically +AppArmor. It solves the problem of restricting applications to the +privileges they require to work, and is an effective solution to the +problem of protecting applications from other applications running for +the same user, which a DAC model is not able to provide. + +SMACK and TOMOYO have essentially no adoption and support when compared +to solutions like SELinux and AppArmor, without providing any clear +advantages. MeeGo would have been a good testing ground for SMACK, but +the fact that it was never really deployed in enforcing mode means that +the potential was never realized. + +SELinux offers the most flexible configuration of security policies, but +it introduces a lot of complexity on the setup and maintenance of the +policies, not only for distribution maintainers but also for application +developers and packagers, which impacts on the costs of the solution. It +is quite common to see Fedora users running into problems caused by +SELinux configuration issues. + +AppArmor stands out as a good middle-ground between flexibility and +maintainability while at the same time having significant adoption: by +the biggest end-user desktop distribution (Ubuntu) and by one of the two +biggest enterprise distributors (SUSE). The fact that it is the security +solution already supported and included in the Ubuntu distribution, +which is the base of the Apertis platform, minimizes the initial effort +to create a secure baseline and reduces the effort needed to maintain +it. Since Ubuntu ships with AppArmor, some of the services and +applications will already be covered by the profiles shipped with +Ubuntu. Creation of additional profiles is made easy by the profile +generator tool that comes with AppArmor. it records everything the +application needs to do during normal operation, and allows for further +refining after the recording session is done. + +Collabora will integrate and validate the existing Ubuntu profiles that +are relevant to the Apertis platform as well as modify or write any +additional profiles required by the base platform. Collabora will also +assist in the creation of profiles for higher level applications that +ship with the final product and on the strategy for profile management +for third party applications. + +#### AppArmor Policy and management examples + +Looking at a few examples might help better visualize how AppArmor +works, and what creating new policies entices. Let's look at a simple +policy file: + +```shell +$ cat /etc/apparmor.d/bin.ping +--- +/bin/ping { + #include <abstractions/base> + #include <abstractions/consoles> + #include <abstractions/nameservice> + + capability net_raw, + capability setuid, + network inet raw, + /bin/ping mixr, + /etc/modules.conf r, + ## Site-specific additions and overrides. See local/README for details. + #include \<local/bin.ping\> +} +$ +--- + +> AppArmor policy shipped for ping in Ubuntu + +This is the policy for the ping command. The binary is specified, then a +few includes that have common rules for the kind of binary ping +(console), and services it consumes (nameservice). Then we have two +rules specifying capabilities that the program is allowed to use, and we +state the fact that it is allowed to do perform raw network operations. +Then it's specified that the process should be able to memory map (m) +/bin/ping, inherit confinement from the parent (i), execute the binary +/bin/ping (x) and read it (r). It's also specified that ping should be +able to read /etc/modules.conf. + +If an attack was able to execute arbitrary code by hijacking the ping +process, then that is all it would be able to do. No reading of +/etc/password would be allowed, for instance. If ping was a very core +feature of the device and starts failing because of a bad policy, it is +possible to disable security enforcement just for ping, leaving the rest +of the system secured (something that would not be easily done with +SMACK or SELinux), by running *aa-disable* with ping's path as the +parameter, or by installing a symbolic link in /etc/apparmor.d/disable: + +```shell +$ aa-disable /bin/ping +Disabling /bin/ping. +$ ls -l /etc/apparmor.d/disable/ +total 0 +lrwxrwxrwx 1 root root 24 Feb 20 19:38 bin.ping -> +/etc/apparmor.d/bin.ping +--- + +> A symbolic link to disable the ping AppArmor policy + +Note that *aa-disable* is only a convenience tool to unload a profile +and link it to the **/etc/apparmor.d/disable** directory. Note that the +convenience script is not currently shipped in the image intended for +the target hardware. It is available in the repository though, and is +available in the development and SDK images since it makes it more +convenient to test and debug issues. + +Note, also, that writing to the **/etc/apparmor.d/disable** directory is +required for creating the symlink there, and the UNIX DAC permissions +system already protects that directory for writing - only root is able +to write to this directory. As discussed in [][A note about root], if an +attacker becomes root the system is already compromised. + +Also, as discussed in the System update & rollback, the system partition +will be mounted read-only, so that is an additional protection layer +already. And in addition to that, the white-list approach discussed in +[][Implementing a white list approach] will already deny writing to anywhere in the file system, +so anything running under the application manager will have an +additional layer of security imposed on them. + +For these reasons, Collabora doesn't see any reason to add additional +security such as AppArmor profiles specifically for protecting the +system against unauthorized disabling of profiles. + +#### Profiles for libraries + +AppArmor profiles are always attached to a binary. That means there is +no way to attach a profile to every program that uses a given library. +However, developers can write files called *abstractions* with rules +that can be included through the *\#include* directive, similar to how +libraries work for programming. Using this feature Collabora has written +rules for the WebKit library, for instance, that can be included +by the browser application as well as by any application that uses the +library. + +There is also concern with protecting internal, proprietary libraries, +so that they cannot be used by applications. In the profiles and +abstractions shipped with Apertis right now, all applications are +allowed to use all libraries that are installed in the public library +paths (such as **/usr/lib**). + +The rationale for this is libraries are only pieces of code that could +be included by the applications themselves, and it would be very +time-consuming and error prone having to specify each and every library +and module the application may need to use directly or that would be +used indirectly by a library used by the application. + +Collabora recommends that proprietary libraries that are used only by +one or a few services should be installed in a private location, such as +the application's directory. That would put those libraries outside of +the paths covered by the existing rules, and they would this be out of +reach for any other application already, given the white-list approach +to session lockdown, as discussed in [][Implementing a white list approach]. + +If that is not possible, because the library hardcodes paths or some +other issue, an explicit deny rule could be added to the +**chaiwala-base** abstraction that implements the general rules +that apply to most applications, including the one that allows access to +all libraries. Collabora can help deciding what to do with specific +libraries through support tickets opened in the bug tracking system. + +> Chaiwala was a development codename for parts of the Apertis system. +> The name is retained here for compatibility +> reasons. + +#### Application installation and upgrades + +For installations and upgrades to be performed, no changes to the +running system's security are necessary, since the processes that manage +upgrade, including the creation of the required snapshots will have +enough power given to them + +An application's profile is read at startup time. That means an +application that has been upgraded will only be contained with the new +rules after it has been restarted. The D-Bus integration works by +querying the kernel interface for the PID it is communicating with, not +its own, so D-Bus itself does not need to be restarted when new profiles +are installed. + +When a *.deb* package is installed its AppArmor profile will be +installed to the system AppArmor profile location (*/etc/apparmor.d/)*, +but in the new snapshot created for the upgrade rather than on the +running system. + +The new version of the upgraded package and its new profile will only +take effect after the system has been rebooted. For details about how +*.deb* packages will be handled when the system is upgraded please see +the *System Updates and Rollback* document. + +For more details on how applications from the store will be handled, the +*Applications* document produced by Collabora goes into details about +how the permissions specified in the manifest will be transformed into +AppArmor profiles and on how they will be installed and loaded. + +#### A note about root + +As has been demonstrated in listing *AppArmor restriction applying to file system links*, +AppArmor can restrict even the +powers of the root user. Most platforms do not try to limit that power +in any way, since if an attacker has breached the system to get root +privileges it's likely that all bets are already off. That said, it +should be possible to limit the root user's ability to modify the +AppArmor profiles, leaving that task solely for the package manager (see +the Applications design for details). + +#### Implementing a white-list approach + +Collabora recommends the use of a white-list approach in which the +app-launcher will be confined to a policy that denies almost everything, +and specific permissions will be granted by the application profiles. +This means all applications will only be able to access what is +expressively allowed by their specific policies, providing Apertis with +a very tight least-privilege implementation. + +A simple example of how that can be achieve using AppArmor is provided +in the following examples. The examples will emulate the proposed +solution by locking down a shell, which represents the Apertis +application launcher, and granting specific privileges to a couple +applications so that they are able to access the files they require. + +Listing *Sample profiles for implementing white-listing* +shows a profile for the shell, essentially denying it +access to everything by not allowing access to any files. It gives the +shell permission to run both ls and cat. Note that flags *rix* are used +for this, meaning the shell can read the binaries (r), and execute them +(x); the *i* preceding the *x* tells AppArmor that these binaries should +inherit the shell's confinement rules, even if they have rules of their +own. + +Then permission is given for the shell to run the *dconf* command. dconf +is GNOME's settings storage. Notice that we have *p* as the prefix for +*x* this time. This means we want this application to use its own rules; +if no rules had been specified, then AppArmor would have fallen back to +using the shell's confinement rules. + +```shell +$ cat /etc/apparmor.d/bin.zsh4 +## Last Modified: Fri May 11 11:43:44 2012 + +#include <tunables/global> +/bin/zsh4 { + #include <abstractions/base> + #include <abstractions/consoles> + #include <abstractions/nameservice> + /bin/ls rix, + /bin/cat rix, + /usr/bin/dconf rpx, + /bin/zsh4 mr, + /usr/lib/zsh/*/zsh/* mr, +} + +$ cat /etc/apparmor.d/usr.bin.dconf +## Last Modified: Fri May 11 11:59:09 2012 + +#include <tunables/global> +/usr/bin/dconf { + #include <abstractions/base> + #include <abstractions/nameservice> + @{HOME}/.cache/dconf/user rw, + @{HOME}/.config/dconf/user r, + /usr/bin/dconf mr, +} +--- + +> Sample profiles for implementing white-listing + +The profile for *dconf* allows reading (and only reading) the user +configuration for dconf itself, and allows reading and writing to the +cache. By using these rules we have both guaranteed that no application +executed from this shell will be able to look at or interfere with +dconf's files, and that dconf itself is able to function when used. +Here's the result: + +--- +% cat .config/dconf/user +cat: .config/dconf/user: Permission denied +% dconf read /apps/empathy/ui/show-offline +true +% +--- + +> Effects of white-list approach profiles + +As shown by this example, the application launcher itself and any +applications which do not posses profiles can be restricted to the bare +minimum permissions, and applications can be given the more specific +privileges they require to do their job, using the *p* prefix to let +AppArmor know that's what is desired. + +## polkit (PolicyKit) + +polkit (formerly PolicyKit) is a service used by various upstream +components in Apertis, as a way to centralize security policy for +actions delegated by one process to another. The central problems +addressed by polkit are that the desired security policies for various +privileged actions are system-dependent and non-trivial to evaluate, and +that generic components such as the kernel's DAC and MAC subsystems do +not have enough context to understand whether a privileged action is +acceptable. + +### Motivation for polkit + +Broadly, there are two ways a process can carry out a desired action: it +can do it directly, or it can use inter-process communication to ask a +service to do that operation on its behalf. If the action is done +directly, the components that say whether it can succeed are the Linux +kernel's normal discretionary access control (DAC) permissions checks, +and if configured, a mandatory access control module (MAC, section 5). + +However, the kernel's relatively coarse-grained checks are not +sufficient to express the desired policies for consumer-focused systems. +A frequent example is mounting file systems on removable devices: if a +user plugs in a USB stick with a FAT filesystem, it is reasonable to +expect the user interface layer to either mount it automatically, or let +the user choose to mount it. Similarly, to avoid data loss, the user +should be able to unmount the removable device when they have finished +with it. + +Applying the desired policy using the kernel's permission checks is not +possible, because mounting and unmounting a USB stick is fundamentally +the same system call as mounting and unmounting any other file system, +which is not desired: if ordinary users can make arbitrary mount system +calls, they can mount a file system that contains setuid executables and +achieve privilege escalation. As a result, the kernel disallows direct +mount and unmount actions by unprivileged processes; instead, user +processes may request that a privileged system process carries out the +desired action. In the case of device mounting, Apertis uses the +privileged udisks2 service to mount and unmount devices. + +In environments that use a MAC framework like AppArmor, actions that +would normally be allowed can also become privileged: for instance, in a +framework for sandboxed applications, most apps should not be allowed to +record audio. The resulting AppArmor adjustments prevent carrying out +these actions directly. The result is that, again, the only way to +achieve them is that a service with a suitable privilege carries out the +action (perhaps with a mandatory user interface prompt first, as in +certain iOS features). + +These privileged requests are commonly sent via the D-Bus interprocess +communication (IPC) system; indeed, this is one of the purposes for +which D-Bus was designed. D-Bus has facilities for allowing or +forbidding messages between particular processes in a somewhat +fine-grained way, either directly or mediated by MAC frameworks. +However, this has the same issue as the kernel's checks for direct mount +operations: the generic D-Bus IPC framework does not understand the +context of the messages. For example, it can allow or forbid messages +that ask to mount a device, but cannot discriminate based on whether the +device in question is a removable device or a system partition, because +it does not have that domain-specific information. + +This means that the security decision – having received this request, +should the service obey it? – must be at least partly made by the +service itself (for example udisks2), which does have the necessary +domain-specific context to do so. + +The kdbus subsystem proposed for inclusion in the Linux kernel, which +aims to supersede the user-space implementation of D-Bus, has an +additional restriction: to minimize the amount of code in the TCB, it +only parses the parts of a message that are necessary for normal +message-routing. As a result, it does not discriminate between messages +by their interface, member name or object-path, only by attributes of +the source and destination processes. This is another reason why +permissions checking for services such as disk-mounting must be done at +least partly by the domain-specific service such as udisks2. + +The desired security policies for certain actions are also relatively +complex. For example, udisks2 as deployed in a modern Linux desktop +system such as Debian 8 would normally allow mounting devices if and +only if: + + - the requesting user is *root,* or + + - the requesting user is in group *sudo*, or + + - all of + + - the device is removable or external, and + + - the mount point is in /media, and + + - the mount options are reasonable, and + + - the device's *seat* (in multi-seat computing) matches one of the + seats at which the user is logged-in, and + + - either + + - the user is in group *plugdev*, or + + - all of + + - the user is logged-in locally, and + + - the user is logged-in on the foreground virtual console + +This is already complex, but it is merely a default, and is likely to be +adjusted further for special purposes (such as a single-user development +laptop, a locked-down corporate desktop, or an embedded system like +Apertis). It is not reasonable to embed these rules, or a sufficiently +powerful parser to read them from configuration, into every system +service that must impose such a policy. + +### polkit's solution + +polkit addresses this by dividing the authorization for actions into two +phases. + +In the first phase, the domain-specific service (such as udisks2 for +disk-mounting) interprets the request and classifies it into one of +several ***actions*** which encapsulate the type of request. The +principle is that the *action* combines the verb and the object for the +desired operation: if a security policy would commonly produce different +results when performing the same verb on different objects, then they +are represented by different actions. For example, udisks2 divides the +high-level operation “mount a disk†into the actions +org.freedesktop.udisks2.filesystem-mount, +org.freedesktop.udisks2.filesystem-mount-system, +org.freedesktop.udisks2.filesystem-mount-other-seat and +org.freedesktop.udisks2.filesystem-fstab depending on attributes of the +disk. It also gathers information about the process making the request, +such as the user ID and process ID. polkit clients do not currently +record the LSM context (AppArmor profile, etc.) used by MAC frameworks, +but could be enhanced to do so. + +In the second phase, the service sends a D-Bus request to polkit with +the desired action, and the attributes of the process making the +request. polkit processes this request according to its configuration, +and returns whether the request should be obeyed. + +In addition to “yes†or “noâ€, polkit security policies can request that +a user, or a user with administrative (root-equivalent) privileges, +authenticates themselves interactively; if this is done, polkit will not +respond to the request until the user has responded to the *polkit +agent*, either by authenticating or by cancelling the operation. + +We recommend that this facility is not used with a password prompt in +Apertis, since that user experience would be highly distracting. For +operations that are deemed to be allowed or rejected by the platform +designer, either the policy should return “yes†or “no†instead of +requesting authorization, or the platform-provided polkit agent should +return that result in response to authorization requests without any +visible prompting. However, a prompt for authorization, without +requiring authentication, might be a desired UX in some cases. + +### Recommendation + +We recommend that Apertis should continue to provide polkit as a system +service. If this is not done, many system components will need to be +modified to refrain from carrying out the polkit check. + +If the desired security policy is merely that a subset of user-level +components may carry out privileged actions via a given system service, +and that all of those user-level components have equal access, we +recommend that Apertis' polkit configuration should allow and forbid +actions appropriately. + +If it is required that certain user-level components can communicate +with a given system service with different access levels, we recommend +enhancing polkit so that it can query AppArmor, giving the *action* as a +parameter, before carrying out its own checks; this parallels what +dbus-daemon currently does for SELinux and AppArmor. + +#### Alternative design: rely entirely on AppArmor checks + +The majority of services that communicate with polkit do so through the +libpolkit-gobject library. This suggests an alternative design: the +polkit service and its D-Bus API could be removed entirely, and the +AppArmor check described above could be carried out in-process by each +service, by providing a “drop-in†compatible replacement for +libpolkit-gobject that performed an AppArmor query itself instead of +querying polkit. + +We do not recommend this approach: it would be problematic for services +such as systemd that do not use libpolkit-gobject, it would remove the +ability for the policy to be influenced by facts that are not known to +AppArmor (such as whether a user is logged-in and active), and it would +be a large point of incompatibility with upstream software. + +## Resource Usage Control + +Resource usage here refers to the limitation and prioritization of +hardware resources usage. Common resources to limit usage of are CPU, +memory, network, disk I/O and IPC. + +The proposed solution is Control Groups ([cgroups]), which is a +Linux kernel feature to limit, account, isolate and prioritize resource +usage of process groups. It protects the platform from resource +exhaustion and DoS attacks. The groups of processes can be dynamically +created and modified. The groups are divided by certain criteria and +each group inherits limits from its parent group. + +The interface to configure a new group is via a pseudo file system that +contains directories to label the groups and each directory can have +sub-directories (sub-groups). All those directories contain files that +are used to set the parameters or provide information about the groups. + +By default, when the system is booted, the init system Collabora +recommends for this project, systemd, will assign separate control +groups to each of the system services. Collabora will further customize +the cgroups of the base platform to clearly separate system services, +built-in applications and third-party applications. Support will be +provided by Collabora for fine-tuning the cgroup profiles for the final +product. + +### Imposing limits on I/O for block devices + +The *blkio* subsystem is responsible for dealing with I/O operations +concerning storage devices. It exports a number of controls that can be +tuned by the *cgroups* subsystem. Those controls fall into one of two +possible strategies: setting proportional weights for different cgroups +or absolute upper bounds. + +The main advantage of using proportional weights is that the it allows +the I/O bandwidth to be saturated – if nothing else is running, an +application always gets all of the available I/O bandwidth. If, however, +two or more processes in different cgroups are competing for access to +the I/O bandwidth, then they will get a share that is proportional to +the weights of their cgroups. + +For example, suppose a process A is on a cgroup with weight **10** (the +minimum value possible) is working on mass-processing of photos, and +process B is on a cgroup with weight **1000** (the maximum). If process +A is the only one making I/O requests, it has the full available I/O +bandwidth available for itself. As soon as process B starts doing its +own I/O requests, however, it will get around **99%** of all the +requests that get through, while process A will have only **1%** for its +requests. + +The second strategy is setting an absolute limit on the I/O bandwidth, +often called *throttling*. This is done by writing how many bytes per +second a cgroup should be able to transfer into a virtual file called +**blkio.throttle.read\_bps\_device**, that lives inside the cgroup. This +allows a great deal of control, but also means applications belonging to +that cgroup are not able to take advantage of the full I/O bandwidth +even if they are the only ones running at a given point in time. + +Specifying a default weight to all applications, lower weights for +mass-processing jobs, and higher weights for time-critical applications +is a good first step in not only securing the system, but also improving +the user experience. The hard-limit of an upper bound on I/O operations +can also serve as a way to make sure no application monopolizes the +system's I/O. + +As is usual for tunables such as these, more specific details on what +settings should be specified for which applications is something that +needs to be developed in an empirical, iterative way, throughout the +development of the platform, and with actual target hardware. More +details on the *blkio* subsystem support for cgroups can be obtained +from [Linux documentation][blkio-doc]. + +## Network filtering + +Collabora recommends the use of the Netfilter framework to filter +network traffic. Netfilter provides a set of hooks inside the Linux +kernel that allow kernel modules to register callback functions with the +network stack. A registered callback function is then called back for +every packet that traverses the respective hook within the network +stack. Iptables is a generic table structure for the definition of rule +sets. Each rule within an iptable consists of a number of classifiers +(iptables matches) and one connected action (iptables target). + +Netfilter, when used with iptables, creates a powerful network packet +filtering system which can be used to apply policies to both IPv4 and +IPv6 network traffic. A base rule set that blocks all incoming +connections will be added to the platform by default, but port 80 access +will be provided for devices connected to the Apertis hotspot, so they +can access the web server hosted on the system. See the Connectivity +document for more information on how this will work. + +The best way to do that seems to be to add acceptance rules for the +predefined private network address space the DHCP server will use for +clients of the hotspot. + +Collabora will offer support in refining the rules for the final +product. Some network interactions may be handled by means of an +AppArmor profile instead. + +## Protecting the driver assistance system from attacks + +All communication with the driver assistance system will be done through +a single service that can be talked to over D-Bus. This service will be +the only process allowed to communicate with the driver assistance +system. This means this service can belong to a separate user that will +be the only one capable of executing the binary, which is Collabora's +first recommendation. + +The daemon will use an IP connection to the driver assistance system, +through a simple serial connection. This means that the character device +entry for this serial connection shall be protected both by an +[udev] rule that assigns permissions for only this particular user. +Access to the device entry should also be denied by the AppArmor profile +which covers all other applications, making sure the daemon's profile +allows it. + +Additionally, process namespace functionality can be used to make sure +the driver assistance network interface is only seen and usable by the +daemon that acts as gatekeeper. This is done by using a Linux-specific +flag to the [clone] system call, CLONE\_NEWNET, which creates a new +process with its network namespace limited to viewing the loopback +interface. + +Having the process in its own cgroup also helps making it more robust, +since Linux tries to be fair among cgroups, so is a good idea in +general. Systemd already puts each service it starts in a separate +cgroup, so making the daemon a system service is enough to take +advantage of that fairness. + +The driver assistance communication daemon shall be started with this +flag on, and have the network interface for talking to the driver +assistance system be assigned to its namespace. When a network interface +is assigned to a namespace only processes in that namespace can see and +interact with it. This approach has the advantage of both protecting the +interface from processes other than the proxy daemon, and protecting the +daemon from the other network interfaces. + +### Protecting devices whose usage is restricted + +One or more cameras will be available for Apertis to control, but they +should not be accessed by any applications other than the ones required +to implement the driver assistance use cases. Cameras are made available +as device files in the /dev file system and can thus be controlled by +both DAC permissions and by making the default AppArmor policy deny +access to it as well. + +## Protecting the system from Internet threats + +The Internet is riddled with malicious or buggy code that present +threats other than those that come from direct attacks to the device's +IP connection. The user of a system such as the Apertis may face attacks +such as emails that link to viruses, trojan horses and other kinds of +malware, web sites that mislead the user or that try to cause the system +to misbehave or become unresponsive. + +There is no single answer to such threats, but care should be exercised +to make each of the subsystems and applications involved in dealing with +content from the Internet robust to such malicious and buggy content. +The solutions that have been presented in the previous sections are +essential for that. + +The first line of defence is, of course, a good firewall setup that +disallows incoming connections, protecting the IP interfaces of the +device. The second line of defence is making sure that the applications +that deal with those threats are well-written. Web browsers have also +grown many techniques to protect the user from both direct attacks such +as denial of service or private information disclosure and indirect +forms of attack such as social engineering. + +The basic rule of protecting the user from web content in a browser is +essentially assuming all content is untrusted. There are fewer APIs that +allow a web application to interact with local resources such as local +files than there are for native applications. The ones that do exist are +usually made possible only through express user interaction, such as +when the user selects a file to upload. Newer API that allows access to +device capabilities such as the geolocation facilities only work after +the user has granted permission. + +Browsers also try to make sure users are not fooled into believing they +are in a different site than the one they are really at, known as +“phishingâ€, which is one of the main social engineering attacks used +on the web. The basic SSL certificate checks, along with proper UI to +warn the user about possible problems can help prevent [man-in-the-middle] +attacks. The HTTP library used by the clutter port of WebKit is +able to verify certificates using the system's trusted Certificate +Authorities. + +> The *ca-certificates* package in Debian and Ubuntu carry those + +In addition to those basic checks, WebKit includes a feature called *XSS +Auditor* which +implements a number of rules and checks to prevent [cross-site +scripting] attacks, sometimes used to mix elements from both a fake +and a legitimate site. + +The web browser can be locked down, like any other application, to limit +the resources it can use up or get access to, and Collabora will be +helping build an AppArmor profile for it. This is what protects the +system from the browser in case it is exploited. By limiting the amount +of damage the browser can do to the system itself, any exploits are also +hindered from reaching the rest of the system. + +It is also important that the UI of the browser behaves well in general. +For instance, user interfaces that make it easy to run executables +downloaded from the web make the system more vulnerable to attacks. A +user interface that makes it easier to distinguish the domain from the +rest of the URI is [sometimes][omnibox] employed to help careful users be sure they +are where they wanted to go. + +Automatically loading pages that were loaded or loading when the browser +had to be terminated or crashed would make it hard for the user to +regain control of the browser too. Existing browsers usually load an +alternate page with a button the user can click to load the page, which +is probably also a good idea for the Apertis browser. + +Collabora evaluated taking the WebKit Clutter port to the new WebKit2 +architecture as part of the Apertis project; as of 2012 it was deemed +risky given the time and budget constraints. + +As of 2015, it has been decided that Apertis will switch away from +WebKit Clutter and onto the GTK+ port, which is already built upon the +WebKit2 architecture. The main feature of that architecture is that it +has several different classes of processes: the UI process deals with user +interaction, the Web processes render page contents, +the Network process mediates access to remote data, +and the Plugin processes are responsible for running plugins. + +The fact that the processes are separate provides a great way of locking +them down properly. The Web processes, which are the most likely to be +exploited in case of successful attack are also the one that needs the +least privileges when it comes to interfacing with the system, so the +AppArmor policies that apply to it can be very strict. If a limited set +of plugins is supported, the same can be applied to the Plugin processes. +In fact, the WebKit codebase contains support for using seccomp filters +(see [][Seccomp]) to sandbox the WebKit2 processes. It may be a useful +addition in the future. + +### Other sources of potential exploitation + +Historically, document viewers and image loaders have had +vulnerabilities exploited in various ways to execute arbitrary code. PDF +and spreadsheet files, for instance, feature domain-specific scripting +languages. These scripting facilities are often sandboxed and limited in +what they can do, but have been a source of security issues +nevertheless. Images do not usually feature scripting, but their loaders +have historically been the source of many security issues, caused by +programming errors, such as buffer overflows. These issues have been +exploited to cause denial of service or run arbitrary code. + +Although these cases do deserve mention specifically for the inherent +risk they bring, there is no silver bullet for this problem. Keeping +applications up-to-date with security fixes, using hardening techniques +such as stack protection, discussed in [][Stack protection], and locking the +application down to its minimum access requirements are the tools that +can be employed to reduce the risks. + +#### Launching applications based on MIME type + +It is common in the desktop world to allow launching an application +through the files that they are able to read. For instance, while +reading email the user may want to view an attachment; by “opening†the +attachment an application that is able to display that kind of file +would be launched with the attachment as an argument. + +Collabora is recommending that all kinds of application launching always +go through the application manager. By doing that, there will be a +centralized way of controlling and limiting the launching of +applications through MIME or other types of content association, +including being able to blacklist applications with known security +issues, for instance. + +## Secure Software Distribution + +Secure software updates are a very important topic in the security of +the platform. Checking integrity and authenticity of the software +packages installed in the system is crucial; an altered package might +compromise the security of the whole platform. + +This section is only related with security aspects, not the whole +software distribution update mechanism, which will be covered in a +separate document. The technology used for this is the same one used by +Ubuntu. It's called [Secure APT] and was introduced in Debian in 2005. + +Every Debian or Ubuntu package that is made available through an APT +repository is hashed and the hash is stored on the file that lists what +packages are available, called the “Packages†file. That file is then +hashed and the hash is stored in the [Release file], which is signed +using a PGP private key. + +The public PGP key is shipped along with the product. When the package +manager obtains updates or new packages it checks that the signature on +the Release file is valid, and that all hashes match. The security of +this approach relies on the fact that any tampering with the package or +with the Packages file would make the hashes not match, and any changes +done to the Release file would render the signature invalid. + +Additional public keys can be distributed through upgrades to a package +that ships installed; this is how Debian and Ubuntu distribute their +public keys. This mechanism can be used to add new third-party +providers, or to replace the keys used by the app store. Collabora will +provide documentation and provide assistance on setting up the package +repositories and signing infrastructure. + +## Secure Boot + +The objective of [secure boot](secure-boot.md) is to ensure that the system is +booted using sanctioned components. The extent to which this is ultimately +taken will vary between implementations, some may use secure boot avoid system +kernel replacement, whilst others may also use it to ensure a +[Trusted Execution Environment](op-tee.md) is loaded without interference. + +The steps required to implement secure boot are vendor specific and thus the +full specification for the solution depends on a definition from the +specific silicon vendor, such as Freescale. + +A solution that has been adopted by Freescale in the past is the High +Assurance Boot (HAB), which ensures two basic attributes: authenticity +and integrity. This is done by validating that the code image originated +from a trusted source (authenticity), and verify that the code is in its +original form (integrity). HAB uses digital signatures to validate the +code images and thereby establishes the security level of the system. + +To verify the signature the device uses the Super Root Key (SRK) which +is stored on-chip in non-volatile memory. To enhance the robustness of +HAB security, multiple Super Root keys (RSA public keys) are stored in +internal ROM. Collabora recommends the utilization of SRK with 2048-bit +RSA keys. + +In case a signature check fails because of incomplete or broken upgrade +it should be possible to fall back to an earlier kernel automatically. +Details of how that would be achieved are only possible after details +about the hardware support for such a feature are provided by Freescale, +and are probably best handled in the document about safely upgrading, +system snapshots and rolling back updates. + +More discussion of system integrity checking, its limitations and +alternatives can be found later on, when the IMA system is investigated. +See [][Conclusion regarding IMA and EVM] in particular. + +The signature and verification processes are described in the Freescale +white paper “Security Features of the i.MX31 and i.MX31Lâ€. + +## Data encryption and removal + +### Data encryption + +The objective of data encryption is to protect the user data for +security and privacy reasons. In the event of the car being stolen, for +instance, important user data such as passwords should not be easily +readable. While providing full disk encryption is both not practical and +harmful to overall system performance, encryption of a limited set of +the data such as saved passwords is possible. + +The [Secrets D-Bus service] is a very practical way of storing +passwords for applications. Its [GNOME implementation][GNOME-secret-service] provides an +easy to use API, uses [locked down memory][GNOME-locked-memory] when handling the +passwords and encrypted storage for the passwords on disk. +Collabora will provide these tools in the base platform and will support +the implementation of secure password storage in the applications that +will be developed. + +One unresolved issue for data encryption, whether via the Secrets +service, a full-disk encryption system (as optionally used in Android) +or some other implementation, is that a secret token must be provided in +order to decrypt the encrypted data. This is normally a password, but +prompting for a password is likely to be undesired in an automotive +environment. One possible implementation is to encode an unpredictable +token in each car key, and use those tokens to decrypt stored secrets, +with any of the keys for a particular car equally able to decrypt its +data. In the simplest version of that implementation, loss of all of the +car keys would result in loss of access to the encrypted data, but the +car vendor could retain copies of the keys' tokens (and a record of +which car is the relevant one) if desired + +### Data removal + +A data removal feature is important to guarantee that personal user data +that resides on the device can be removed before the car changes hands, +for instance. Returning the device configuration to factory is also +important because it allows resetting of any customization and +preferences. + +Collabora recommends these features be implemented by making sure user +data and settings are stored in a separate storage area. By removing +this area both user data and configuration are removed. + +Proper data wiping is only necessary to defeat forensic analysis of the +hardware and would not pose a privacy risk for the simpler cases of the +car changing hands. Such procedures rely on hardware support, so would +only be possible if that is in place, and even in that case they may be +very time consuming. It's also worth noting that flash storage will +usually perform wear levelling, which defeats software techniques such +as writing over a block multiple times. Collabora recommends not +supporting this feature. + +## Stack Protection + +It is recommended to enable stack protection, which provides protection +against stack-based attacks such as a stack buffer overflow. Ubuntu, the +distribution used as a base for Apertis has enabled a stack protection +mechanism offered by GCC called [SSP]. Modern processors have the +capability to mark memory segments (like stack) executable or not, which +can be used by applications to make themselves safer. Some initial tests +with the Freescale kernel 2.6.38 provided on imx6 board shows correct +enforcement behaviour. + +Memory protection techniques like disabling execution of stack or heap +memory are not possible with some applications, in particular execution +engines such as programming language interpreters that include a just in +time compiler, including the ones for JavaScript currently present in +most web engines. Cases such as this and also cases in which the +limitations should apply but are not being respected will be documented. + +Collabora will also document best practices for building software with +this feature so that others can take advantage of stack protection for +higher level libraries and applications. + +## Confining applications in containers + +### LXC Containment + +[LXC] is a solution that was developed to be a +lightweight alternative to virtualization, built on top of cgroups and +namespaces, mainly. Its main focus is on servers, though. The goal is to +separate processes completely, including using a different file system +and a different network. This means the applications running inside an +LXC container are effectively running in a different system, for all +practical purposes. While this does have the potential of helping +protect the main system, it also brings with it huge problems with the +integration of the application with the system. + +For graphical applications the X server will have to run with a TCP port +open, so that applications running in a container are able to connect, +3D acceleration will be impossible or very difficult to achieve for +applications running in a container. D-Bus setup will be significantly +more complex. + +Besides increasing the complexity of the system, LXC essentially +duplicates functionality offered by cgroups, AppArmor, and the Netfilter +firewall. When LXC was originally suggested it was to be used only for +system services. By using systemd the Apertis system will already have +every service on the system running on their own cgroup, and properly +locked down by AppArmor profiles. This means adding LXC would only add +redundancy and no additional value. + +Protection for the driver assistance and limiting the damage root can do +to the system can both be achieved by AppArmor policies, which can be +applied to both system services and applications, as opposed to LXC, +which would only be safely applicable to services. There are no +advantages at all in using LXC for these cases. Limiting resources can +also be easily done through cgroups, which will not be limited to system +services, too. For these reasons Collabora recommends against using LXC. + +#### Making X11, D-Bus and 3D work with LXC + +For the sake of completeness, this section provides a description of +possible solutions for LXC shortcomings. + +LXC creates what, for all practical purposes, is a separate system. X +supports TCP socket connections, so it could be made to work, but that +would require opening the TCP port and that would be another interface +that needs protection. + +D-Bus has the same pros and cons of X11 – it can be connected to over a +[TCP port][dbus-tcp], but that again increases the surface area that needs to +be protected, and adds complexity for managing the connection. It is +also not a popular use case so it does not get a lot of testing. + +3D over network has not yet been made to work on networked X. All +solutions available, such as [Virtual GL] involve a lot of copying +back and forth, which would make performance suffer substantially, which +is something that needs to be avoided given the high importance of +performance on Apertis requirements. + +Collabora's perspective is that using LXC for applications running on +the user session adds nothing that cannot be achieved with the means +described in this document, while at the same time adding complexity and +indirection. + +### The Flatpak framework + +[Flatpak] is a framework for “sandboxed†desktop applications, under +development by several GNOME developers. Like LXC, it makes use of +existing Linux infrastructure such as cgroups (see [][Resource usage control]) and +namespaces. + +Unlike LXC, Flatpak's design goals are focused on confining individual +applications within a system, which makes it an interesting technology +for Apertis. We recommend researching Flatpak further, and evaluating +its adoption as a way to reduce the development effort for our sandboxed +applications. + +One secondary benefit of Flatpak is that by altering the application +bundle's view of the filesystem, it can provide a way to manage +major-version upgrades without app-visible compatibility breaks, by +continuing to run app bundles that were designed for the old “runtime†+in an environment more closely resembling that old version, while using +the new “runtime†for app bundles that have been tested in that +environment. + +## The IMA Linux Integrity Subsystem + +The basics of the Integrity Measurement Architecture ([IMA]) +subsystem have been a part of Linux since the version 2.6.30, viewing of +the records has been included in 2.6.36, and local verification has been +[submitted][kernel-local-verif] to the kernel maintainers very recently, in late January 2012. +The goal of the subsystem is to make sure that a given set +of files have not been altered and are authentic – in other words, +provided by a trusted source. The mechanism used to provide these two +features are essentially keeping a database of file hashes and RSA +signatures. IMA does not protect the system from changes, it is simply a +way of knowing that changes have been made so that measures to fix the +problem can be taken as quickly as possible. The authenticity module of +IMA is still not available, so we won't be discussing it. + +In its simpler mode of operation, with the default policy IMA will +intercept calls that cause memory mapping and execution of a file or any +access done by root and perform a hash of the file before the access +goes through. This means execution of all binaries and loading of all +libraries are intercepted. To hash a file, IMA needs to read the whole +file and calculate a cryptographic sum of its contents. That hash is +then kept in kernel memory and extended attributes of the file system, +for further verification after system reboots. + +This means that running any program will cause its file and any +libraries it uses to be fully read and cryptographically processed +before anything can be done with it, which causes a significant impact +in the performance of the system. A 10% impact has been [reported][IMA LPC] by the +IMA authors in boot time on a default Fedora. There are no +detailed information on how the test was performed, but the performance +impact of IMA is mainly caused by increased I/O required to read the +whole of all executable and library files used during the boot for hash +verification. All executables will take longer to start up after a +system boot up because they need to be fully read and hashed to verify +they match what's recorded (if any recording exists). + +The fact that the hashes are maintained in the file system extended +attributes, and are otherwise created from scratch when the file is +first mapped or executed means that in this mode IMA does not protect +the system from modification while offline: an attacker with physical +access to the device can boot using a different operating system modify +files and reset the extended attributes. Those changes will not be seen +by IMA. + +To overcome this problem IMA is able to work with the hardware's trusted +platform module through the extended verification module ([EVM]), +[added][kernel-EVM] to Linux in version 3.2: hashes of the extended attributes +are signed by the trusted platform module (TPM) hardware, and written to +the file system as another extended attribute. For this to work, though, +TPM hardware is required. The fact that TPM modules are currently only +widely available and supported for Intel-based platforms is also a +problem. + +### Conclusion regarding IMA and EVM + +IMA and EVM both are only useful for detecting that the system has been +modified. They do so using a method that incurs significant impact on +the performance, particularly application startup and system boot up. +Considering the strict boot up requirements for the Apertis system, this +fact alone indicates that IMA and EVM are suboptimal solutions. However, +EVM and IMA also suffer from being very new technologies as far as Linux +mainline is concerned, and have not been integrated and used by any +major distributions. This means implementing them in Apertis means +incurring into significant development costs. + +In addition to that, Collabora believes that the goals of detecting +breaches, protecting the base system and validating the authenticity of +system files are attained in much better ways through other means, such +as keeping the system files separate and read-only during normal +operation, and using secure methods for installing and updating +software, such as those described in [][Protecting the driver assistance system from attacks]. + +For these reasons Collabora advises against the usage of IMA and EVM for +this project. An option to provide some security for the system in this +case is making it hard to disconnect and remove the actual storage +device from the system, to minimize the risk of tampering. + +## Seccomp + +[Seccomp] is a sandboxing mechanism in the Linux kernel. In essence, +it is a way of specifying which system calls a process or thread should +be able to make. As such, it is very useful to isolate processes that +have strict responsibilities. For instance, a process that should not be +able to write or read from the disk should not be able to make an *open* +system call. + +Most security tools that were discussed in this document provide a +system-wide infrastructure and protect the system in a general way from +outside the application's process. As opposed to those, seccomp is +something that is very granular and very application-specific: it needs +to be built into the application source code. + +In other words, applications need to be written with an architecture +which allows a separation of concerns, isolating the work that deals +with untrusted processes or data to a separate process or thread that +will then use seccomp filters to limit the amount of damage it is able +to do through system calls. + +For use by applications, seccomp needs to be enabled in the kernel that +is shipped with the middleware. There is a library called +**[libseccomp]**, which provides a more convenient way of specifying +filters. Should feature be used and made it available through the SDK, +the seccomp support can be enabled in the kernel and libseccomp can be +shipped in the middleware image provided by Collabora. + +The seccomp filter should be used on system services designed for +Apertis whose architecture and intended functionality allow dropping +privileges. Suppose, for instance, that Apertis has a health management +daemon which needs to be able to kill applications that misbehave but +has no need whatsoever of writing data to a file descriptor. It might be +possible to design that daemon to use seccomp to filter out system calls +such as **open** and **write**. The **open** system call might need to +be allowed to go through for opening files for reading, depending on how +the health daemon monitors processes – it might need to read information +from files in the **/proc** file system, for instance. For that reason, +filtering for **open** would need to be more granular, just disallowing +it being called with certain arguments. + +Depending on how the health management daemon works it would also not +need to fork new processes itself, so filtering out system calls such as +**fork**, and **clone** is a possibility. As explained before, to take +advantage of these opportunities, the architecture of such a daemon +needs to be thought through from the onset with these limitations in +mind. Opportunities, such as the ones discussed here, should be +evaluated on a case-by-case basis, for each service intended for +deployment on Apertis. + +AppArmor and seccomp are complementary technologies, and can be used +together. Some of their purposes overlap (for example, denying +filesystem write access altogether could be achieved equally well with +either technology), and they are both part of the kernel and hence in +the TCB. + +The main advantage of seccomp over AppArmor is that it inhibits all +system calls, however obscure: all system calls that were not considered +when writing a policy are normally denied. Its in-kernel implementation +is also simpler, and hence potentially more robust, than AppArmor. This +makes it suitable for containing a module whose functionality has been +designed to be strongly focused on computation with minimal I/O +requirements, for example the rendering modules of browser engines such +as WebKit2. However, its applicability to code that was not designed to +be suitable for seccomp is limited. For example, if the confined module +has a legitimate need to open files, then its seccomp filter will need +to allow broad categories of file to be opened. + +The main advantage of AppArmor over seccomp is that it can perform +finer-grained checking on the arguments and context of a system call, +for example allowing filesystem reads from files owned by the process's +uid, but denying reads from other uids' files. This makes it possible to +confine existing general-purpose components using AppArmor, with little +or no change to the confined component. Conversely, it groups together +closely-related system calls with similar security implications into an +abstract operation such as “read†or “writeâ€, making it considerably +easier to write correct profiles. + +## The role of the app store process for security + +The model which is used for the application stores should precludes +automated publishing of software to the store by developers. All +software, including new versions of existing applications will have to +go through an audit before publishing. + +The app store vetting process will generate the final package that will +reach the store front. That means only signatures made by the app store +curator's cryptographic keys will be valid, for instance. Another +consequence of this approach is that the curator will have not only the +final say on what goes in, but will also be able to change pieces of the +package to, say, disallow a given permission the application's author +specified in the application's manifest. + +This also presents a good opportunity to convert high level descriptions +such as the permissions in the manifest and an overall description of +files used into concrete configuration files such as AppArmor profiles +in a centralized fashion, and provides the curator with the ability to +fine tune said configurations for specific devices or even to rework how +a given resource is protected itself, with no need for intervention from +third-parties. + +Most importantly, from the perspective of this document, is the fact +that the app store vetting process provides an opportunity for final +screening of submissions for security issues or bad practices both in +terms of code and user interface, so that should be taken into +consideration. + +## How does security affect developer usage of a device? + +How security impacts a developer mode depends heavily on how that +developer mode of work is specified. This chapter considers that the two +main use cases for such a mode would be installing an application +directly to the target through the Eclipse *install to target* plugin and +running a remote debugging session for the application, both of which +are topics discussed in the SDK design. + +The *install to target* functionality that was made available through an +Eclipse plugin uses an **sftp** connection with an arbitrary user and +password pair to connect to the device. This means that putting the +device in developer mode should ensure the **ssh** server is running and +add an exception to the firewall rules discussed in [][Network filtering], +to allow an inbound connection to port 22. + +Upon login, the SSH server will start user sessions that are not +constrained by the AppArmor infrastructure. In particular the white-list +policy discussed in section [][Implementing a white list approach], +will not apply to ssh user sessions. This means the user the IDE will +connect with needs file system access to the directory where the +application needs to be installed or be able to tell the application +installer to install it. + +The procedure for installing an application using an **sftp** connection +is not too different from the *install app from USB stick* use case +described in the Applications document, that similarity could be +exploited to share code for these features. + +The main difference is the developer mode would need to either ignore +signature checking or accept a special “developer†signature for the +packages. Decision on how to implement this piece of the feature needs a +more complete assessment of proposed solutions on how the app store and +system DRM could work, and how open (or openable) the end user devices +will be. + +Running the application for remote debugging also requires that the +**gdbserver**'s default port, 2345, be open. Other than that, the main +security constraint that will need to be tweaked when the system is put +in developer mode is AppArmor. While under developer mode AppArmor +should probably be put in complain mode, since the application's own +profile will not yet exist. + +## Further discussion + +This chapter lists topics that require further thinking and/or +discussion, or a more detailed design. These may be better written as +Wiki pages rather than formal designs, given they require and benefit +from iterating on an implementation. + + - Define which cgroups ([][Resource usage control]) to have, how they will be created + and managed + + - Define exactly what Netfilter rules ([][Network filtering]) should be installed + and how they will be made effective at boot time + + - Evaluate Flatpak ([][The Flatpak framework]) + +[Interface Discovery]: https://wiki.apertis.org/Interface_discovery + +[Data Sharing]: https://wiki.apertis.org/Data_sharing + +[PolicyKit]: http://live.gnome.org/PolicyKit + +[Android-responsiveness]: http://developer.android.com/guide/practices/design/responsiveness.html + +[bada-privileged-api]: http://developer.bada.com/help/index.jsp?topic=/com.osp.documentation.help/html/bada_overview/using_privileged_api.htm + +[iOS-sec]: http://images.apple.com/ipad/business/docs/iOS_Security_May12.pdf + +[iOS-sandbox]: http://www.usefulsecurity.com/2007/11/apple-sandboxes-part-1/ + +[TrustedBSD]: http://www.trustedbsd.org/mac.html + +[lwn-secure-computing]: http://lwn.net/Articles/475043/ + +[SELinux]: http://selinuxproject.org/page/Main_Page + +[TOMOYO Linux]: http://tomoyo.sourceforge.jp/ + +[SMACK]: http://schaufler-ca.com/ + +[Tizen]: https://developer.tizen.org/sdk.html + +[SMACK-reviews]: https://bugs.freedesktop.org/show_bug.cgi?id=47581 + +[AppArmor]: http://wiki.apparmor.net/index.php/Main_Page + +[aa-genprof]: http://wiki.apparmor.net/index.php/Profiling_with_tools + +[apparmor-dbus-additions]: http://wiki.apparmor.net/index.php/AppArmor_Core_Policy_Reference#DBUS_rules + +[SELinux mediation]: http://dbus.freedesktop.org/doc/dbus-daemon.1.html#lbAg + +[SELinux-perf-impact]: http://blog.larsstrand.no/2007/11/rhel5-selinux-benchmark.html + +[smack-embedded-tv]: http://www.embeddedalley.com/pdfs/Smack_for_DigitalTV.pdf + +[cgroups]: http://www.kernel.org/doc/Documentation/cgroups/cgroups.txt + +[blkio-doc]: http://www.kernel.org/doc/Documentation/cgroups/blkio-controller.txt + +[udev]: http://en.wikipedia.org/wiki/Udev + +[clone]: http://www.kernel.org/doc/man-pages/online/pages/man2/clone.2.html + +[man-in-the-middle]: https://en.wikipedia.org/wiki/Man-in-the-middle_attack + +[cross-site scripting]: http://en.wikipedia.org/wiki/Cross-site_scripting + +[omnibox]: http://chrome.blogspot.com.br/2010/10/understanding-omnibox-for-better.html + +[Secure APT]: http://wiki.debian.org/SecureApt + +[Release file]: http://wiki.debian.org/SecureApt#Secure_apt_groundwork:_checksums + +[Secrets D-Bus service]: http://standards.freedesktop.org/secret-service/re01.html + +[GNOME-secret-service]: https://wiki.gnome.org/Projects/GnomeKeyring + +[GNOME-locked-memory]: https://wiki.gnome.org/Projects/GnomeKeyring/Memory + +[SSP]: https://wiki.ubuntu.com/GccSsp + +[LXC]: http://lxc.sourceforge.net/ + +[dbus-tcp]: http://www.freedesktop.org/wiki/Software/DBusRemote + +[Virtual GL]: http://www.virtualgl.org/ + +[Flatpak]: https://flatpak.org/ + +[IMA]: http://sourceforge.net/apps/mediawiki/linux-ima/index.php?title=Main_Page + +[kernel-local-verif]: http://thread.gmane.org/gmane.linux.file-systems/61111/focus=61121 + +[IMA LPC]: http://linuxplumbersconf.org/2009/slides/David-Stafford-IMA_LPC.pdf + +[EVM]: http://sourceforge.net/apps/mediawiki/linux-ima/index.php?title=Main_Page#Linux_Extended_Verification_Module_.28EVM.29 + +[kernel-EVM]: http://kernelnewbies.org/Linux_3.2#head-03576b924303bb0fad19cabb35efcbd33eeed084 + +[Seccomp]: https://github.com/torvalds/linux/blob/master/Documentation/prctl/seccomp_filter.txt + +[libseccomp]: https://lwn.net/Articles/494252/ diff --git a/content/designs/sensors-and-actuators.md b/content/designs/sensors-and-actuators.md new file mode 100644 index 0000000000000000000000000000000000000000..2d423c37cf3bc51b8ef963c50eff14b0bff1e812 --- /dev/null +++ b/content/designs/sensors-and-actuators.md @@ -0,0 +1,2275 @@ +--- +title: Sensors and actuators +short-description: Possible approaches for exposing vehicle sensor information and allowing interaction with actuators + (implemented) +authors: + - name: Philip Withnall +--- + +# Sensors and actuators + +## Introduction + +This documents possible approaches to designing an API for exposing +vehicle sensor information and allowing interaction with actuators to +application bundles on an Apertis system. + +The major considerations with a sensors and actuators API are: + + - Bandwidth and latency of sensor data such as that from parking + cameras + + - Enumeration of sensors and actuators + + - Support for multiple vehicles or accessories + + - Support for third-party and OEM accessories and customisations + + - Multiplexing of access to sensors + + - Privilege separation between application bundles using the API + + - Policy to restrict access to sensors (privacy sensitive) + + - Policy to restrict access to actuators (safety critical) + +## Terminology and concepts + +### Vehicle + +For the purposes of this document, a *vehicle* may be a car, car +trailer, motorbike, bus, truck tractor, truck trailer, agricultural +tractor, or agricultural trailer, amongst other things. + +### Intra-vehicle network + +The *intra-vehicle network* connects the various devices and processors +throughout a vehicle. This is typically a CAN or LIN network, or a +hierarchy of such networks. It may, however, be based on Ethernet or +other protocols. + +The vehicle network is defined by the OEM, and is statically defined — +all devices which are supported by the network have messages or +bandwidth allocated for them at the time of manufacture. No devices +which are not known at the time of manufacture can be supported by the +vehicle network. + +### Inter-vehicle network + +An *inter-vehicle network* connects two or more *physically connected* +vehicles together for the purposes of exchanging information. For +example, a network between a truck tractor and trailer. + +An inter-vehicle network (for the purposes of this document) does *not* +cover transient communications between separate cars on a motorway, for +example; or between a vehicle and static roadside infrastructure it +passes. These are car-to-car (C2C) and car-to-infrastructure (C2X) +communications, respectively, and are handled separately. + +### Sensor + +A *sensor* is any input device which is connected to the vehicle’s +network but which is not a direct part of the dashboard user interface. +For example: parking cameras, ultrasonic distance sensors, air +conditioning thermometers, light level sensors, etc. + +### **Actuator** + +An *actuator* is any output device which is connected to the vehicle’s +network but which is not a direct part of the dashboard user interface. +For example: air conditioning heater, door locks, electric window +motors, interior lights, seat height motors, etc. + +### Device + +A sensor or actuator. + +## Use cases + +A variety of use cases for application bundle usage of sensor data are +given below. Particularly important discussion points are highlighted at +the bottom of each use case. + +### Augmented reality parking + +When parking, the feed from a rear-view camera should be displayed on +the screen, with an overlay showing the distance between the back of the +vehicle and the nearest object, taken from ultrasonic or radar distance +sensors. + +The information from the sensors has to be synchronised with the camera, +so correct distance values are shown for each frame. The latency of the +output image has to be low enough to not be noticed by the driver when +parking at low speeds (for example, 5km·h). + +### Virtual mechanic + +Provide vehicle status information such as tyre pressure, engine oil +level, washer fluid level and battery status in an application bundle +which could, for example, suggest routine maintenance tasks which need +to be performed on the vehicle. + +(Taken from +*http://www.w3.org/2014/automotive/vehicle\_spec.html\#h2\_abstract*.) + +#### Trailer + +The driver attaches a trailer to their vehicle and it contains tyre +pressure sensors. These should be available to the virtual mechanic +bundle. + +### Petrol station finder + +Monitor the vehicle’s fuel level. When it starts to get low, find nearby +petrol stations and notify the driver if they are near one. Note that +this requires programs to be notified of fuel level changes while not in +the foreground. + +### Sightseeing application bundle + +An application bundle could highlight sights of interest out of the +windows by combining the current location (from GPS) with a direction +from a compass sensor. Using a compass rather than the GPS’ velocity +angle allows the bundle to work even when the vehicle is stationary. + +**Privacy concern**: Any application bundle which has access to compass +data can potentially use dead reckoning to track the vehicle’s location, +even without access to GPS data. + +#### Basic model vehicle + +If a vehicle does not have a compass sensor, the sightseeing bundle +cannot function at all, and the Apertis store should not allow the user +to install it on their vehicle. + +### Changing bundle functionality when driving at speed + +An application bundle may want to voluntarily change or disable some of +its features when the vehicle is being driven (as opposed to parked), or +when it is being driven fast (above a cut-off speed). It might want to +do this to avoid distracting the driver, or because the features do not +make sense when the vehicle is moving. This requires bundles to be able +to access speedometer and driving mode information. + +If the application bundle is using a cut-off speed for this decision, it +should not have to continually monitor the vehicle’s speed to determine +whether the cut-off has been reached. + +### Changing audio volume with vehicle or cabin noise + +Bundles may want to adjust their audio output volume, or disable audio +output entirely, in response to changes in the vehicle’s cabin or engine +noise levels. For example, a game bundle could reduce its effects volume +if a loud conversation can be heard in the cabin; but it might want to +increase its effects volume if engine noise increases. + +**Privacy concern**: This should be implemented by granting access to +overall ‘volume level’ information for different zones in the vehicle; +but *not* by granting access to the actual audio input data, which would +allow the bundle to record conversations. The overall volume level +information should be sufficiently smoothed or high-latency that a +malicious application cannot infer audio information from it. + +### Night mode + +Programs may wish to change their colour scheme according to the ambient +lighting level in a particular zone in the cabin, for example by +switching to a ‘night mode’ with a dark colour scheme if driving at +night, but not if an interior light is on. This requires bundles to be +able to read external light sensors and the state of internal lights. + +### Weather feedback or traffic jam feedback + +A weather bundle may want to crowd-source information about local +weather conditions to corroborate its weather reports. Information from +external rain, temperature and atmospheric pressure sensors could be +collected at regular intervals – even while the weather bundle is not +active – and submitted to an online weather service as network +connectivity permits. + +Similarly, a traffic jam or navigation bundle may want to crowd-source +information about traffic jams, taking input from the speedometer and +vehicle separation distance sensors to report to an online service about +the average speed and vehicle separation in a traffic jam. + +### Insurance bundle + +A vehicle insurance company may want to offer lower insurance premiums +to drivers who install its bundle, if the bundle can record information +about their driving safety and submit it to the insurance company to +give them more information about the driver’s riskiness. This would need +information such as driving duration, distances driven, weather +conditions, acceleration, braking frequency, frequency of using +indicator lights, pitch, yaw and roll when cornering, and potentially +vehicle maintenance information. It would also require access to unique +identifiers for the vehicle, such as its VIN. The data would need to be +collected regardless of whether the vehicle is connected to the internet +at the time — so it may need to be stored for upload later. + +**Privacy concern**: Unique identification information like a VIN should +not be given to untrusted bundles, as they may use it to track the user +or vehicle. + +### Driving setup bundle + +An application bundle may want to control the driving setup — the +position of the steering wheel, its rake, the position of the wing +mirrors, the seat position and shape, whether the vehicle is in sport +mode, etc. If a guest driver starts using the vehicle, they could import +their settings from the same bundle on their own vehicle, and the bundle +would automatically adjust the physical driving setup in the vehicle to +match the user’s preferences. The bundle may want to restrict these +changes to only happen while the vehicle is parked. + +### Odour detection + +A vehicle manufacturer may have invented a new type of interior sensor +which can detect foul odours in the cabin. They want to integrate this +into an application bundle which will change the air conditioning +settings temporarily to clear the odour when detected. The Sensors and +Actuators API currently has no support for this new sensor. The +manufacturer does not expect their bundle to be used in other vehicles. + +### Air conditioning control + +An application bundle which connects to wrist watch body monitors on +each of the passengers (through an out-of-band channel like Bluetooth, +which is out of the scope of this document; see [][Bluetooth wrist watch and the Internet of Things] +may want to change the cabin temperature in +response to thermometer readings from passengers’ watches. + +#### Automatic window feedback + +In order to do this, the bundle may also need to close the automatic +windows, but one of the passengers has their arm hanging out of the +window and the hardware interlock prevents it closing. The bundle must +handle being unable to close the window. + +### Agricultural vehicle + +Apertis is used by an agricultural manufacturer to provide an IVI system +for drivers to use in their latest tractor model. The manufacturer +provides a pre-installed app for controlling their own brand of +agricultural accessories for the tractor, so the driver can use it to +(for example) control a tipping trailer and a baler which are hitched to +each other behind the tractor, and also control a bale spear attached to +the front of the tractor. + +### Roof box + +A car driver adds a roof box to their car, provided by a third party, +containing a safety sensor which detects when the box is open. The +built-in application bundle for alerting the driver to doors which are +open when the vehicle starts moving should be able to detect and use +this sensor to additionally alert the driver if the roof box is open +when they start moving. + +### Truck installations + +Trucks are sold as a basis ‘vanilla’ truck with a special installation +on top, which is customised for the truck’s intended use. For example, a +rubbish truck, tipping truck or police truck. The installation is +provided by a third party who has a relationship with the basis truck +manufacturer. Each installation has specific sensors and actuators, +which are to be controlled by an application bundle provided by the +third party or by the manufacturer. + +### Compromised application bundle + +An application bundle on the system, A, is installed with permissions to +adjust the driver’s seat position, which is one of the features of the +bundle. Another application bundle, B, is installed without such +permissions (as they are not needed for its normal functionality). + +**Safety critical**: An attacker manages to exploit bundle B and execute +arbitrary code with its privileges. The attacker must not be able to +escalate this exploit to give B permission to use actuators attached to +the system, or extra sensors. Similarly, they must not be able to +escalate the exploit to gain the privileges of bundle A, and hence +bundle A’s permissions to adjust the driver’s seat position. + +### Ethernet intra-vehicle network + +A vehicle manufacturer wants to support high-bandwidth devices on their +intra-vehicle network, and decides to use Ethernet for all intra-vehicle +communications, instead of a more traditional CAN or LIN network. Their +use of a different network technology should not affect enumeration or +functionality of devices as seen by the user. + +### Development against the SDK + +An application developer wants to use a local gyroscope sensor attached +to their development machine to feed input to their application while +they are developing and testing it using the SDK. + +## Non-use-cases + +### Bluetooth wrist watch and the Internet of Things + +A passenger gets into the vehicle with a Bluetooth wrist watch which +monitors their heart rate and various other biological variables. They +launch their health monitor bundle on the IVI display, and it connects +to their watch to download their recent activity data. + +This is not a use case for the Sensors and Actuators API; it should be +handled by direct Bluetooth communication between the health monitor +bundle and the watch. If the Sensors and Actuators API were to support +third-party devices (as opposed to ones specified and installed by the +vehicle manufacturer or suppliers), having full support for all +available devices would become a lot harder. Additionally, devices would +then appear and disappear while the vehicle was running (for example, if +the user turned off their watch’s Bluetooth connection while driving); +this is not possible with fixed in- vehicle sensors, and would +complicate the sensor enumeration API. + +More generally, this use-case is a specific case of the internet of +things (IoT), which is out of scope for this design for the reasons +given above. Additionally, supporting IoT devices would mean supporting +wireless communications as part of the sensors service, which would +significantly increase its attack surface due to the complexity of +wireless communications, and the fact they enable remote attacks. + +### Car-to-car and car-to-infrastructure communications + +In C2C and C2X communications, vehicles share data with each other as +they move into range of each other or static roadside infrastructure. +This information may be anything from braking and acceleration +information shared between convoys of vehicles to improve fuel +efficiency, to payment details shared from a car to toll booth +infrastructure. + +While many of the use cases of C2C and C2X cover sharing of sensor data, +the data being shared is typically a limited subset of what’s available +on one vehicle’s network. Due to the dynamic nature of C2C and C2X +networks, and the greater attack surface caused by the use of more +complex technologies (radio communications rather than wired buses), a +conservative approach to security suggests implementing C2C and C2X on a +use-case-by-use-case basis, using separate system components to those +handling intra-vehicle sensors and actuators. This ensures that control +over actuators, which is safety critical, remains in a separate security +domain from C2C and C2X, which must not have access to actuators on the +local vehicle. See [][Security]. + +An initial suggestion for C2C and C2X communications would be to +implement them as a separate service which consumes sensor data from the +sensors and actuators service just like other applications. + +### Buddied and vehicle fleet communications + +Similarly, long-range communications of sensor data between buddied +vehicles or vehicles operating in a fleet (for example, a haulage or +taxi fleet) should be handled separately from the sensors and actuators +service, as such systems involve network communications. Typical use +cases here would be reporting speed and fuel usage information from +trucks or taxis back to headquarters; or letting two friends know each +others’ locations and traffic conditions when both doing the same +journey. + +## Requirements + +### **Enumeration of devices** + +An application bundle must be able to enumerate devices in the vehicle, +including information about where they are located in the vehicle (for +example, so that it can adjust the position and setup of the driver’s +seat but not others (see [][Driving setup bundle])). + +It is expected that the set of devices in a vehicle may change +dynamically while the vehicle is running, for example if a roof box were +added while the engine was running ([][Roof box]). + +Enumeration is particularly important for bundles, as the set of sensors +in a particular vehicle will not change, but the set of sensors seen by +a bundle across all the vehicles it’s installed in will vary +significantly. + +### Enumeration of vehicles + +An application bundle must be able to enumerate vehicles connected to +the inter-vehicle network, for example to discover the existence of +hitched trailers or agricultural vehicles ([][Trailer], [][Agricultural vehicle]). + +It is expected that the set of vehicles may change dynamically while the +vehicles are running. + +### Retrieving data from sensors + +An application bundle must be able to retrieve data from sensors. This +data must be strongly typed in order to minimise the possibility of a +bundle misinterpreting it, or sensors from different manufacturers using +different units, for example. Sensor data could vary in type from +booleans (see [][Night mode]) through to streaming video data (see [][Augmented reality parking]). +Sensor data may be processed by the system to make it +more useful for application bundles; they do not need direct access to +raw sensor data. + +### Sending data to actuators + +An application bundle must be able to send data to actuators (including +invoking methods on them). This data must be strongly typed in order to +minimise the possibility of a bundle misinterpreting it, or actuators +from different manufacturers using different units, for example. +Actuator data could vary in type from booleans through to enumerated +types (see [][Driving setup bundle]) and possibly larger data streams, +though no concrete use cases exist for that. + +### Network independence + +The API should be independent of the network used to connect to devices +— whether it be Ethernet, LIN or CAN; or whether the device is +connected directly to a host processor ([][Ethernet intra-vehicle network]). + +### Bounded latency of processing sensor data + +Certain sensor data has bounds on its latency. For example, pitch, yaw +and roll information typically arrive as angular rate from sensors, and +have to be integrated over time to be useful to application bundles — if +sensor readings are missed, accuracy decreases. Sensor readings should +be processed within the latency limits specified by the sensors. The +limits on forwarding this processed data to bundles are less strict, +though it is expected to be within the latency noticeable by humans +(around 20ms) so that it can be displayed in real time (see [][Augmented reality parking], +[][Sightseeing application bundle], [][Changing audio volume with vehicle or cabin noise]). + +### Extensibility for OEMs + +New types of device may be developed after the Sensors and Actuators API +is released. As the set of sensors in a vehicle does not vary after +release, already-deployed versions of the API do not need to handle +unknown devices. However, there must be a mechanism for OEMs or third +parties working with them to define new device types when developing a +new vehicle or an installation or accessory to go with it. In order for +new devices to be usable by non-OEM application bundle authors, the +Sensors and Actuators API must be updatable or extensible to support +them. ([][Odour detection], [][Truck installations].) + +### Third-party backends + +If an OEM or third party produces a new device which can be connected to +an existing vehicle, some code needs to exist to allow communication +between the device and the Apertis sensors and actuators service. This +code must be written by the device manufacturer, as they know the +hardware, and must be installable on the Apertis system before or after +vehicle production (so as a system or non-system application). (See +[][Agricultural vehicle], [][Roof box], [][Truck installations].) + +### Third-party backend validation + +If a third-party device is exposed to the sensors and actuators service, +the third party might not be one who has contributed to or used Apertis +before. There must be a process for validating backends for the sensors +and actuators system, to ensure they have a certain level of code +quality and security, in order to reduce the attack surface of the +service as a whole. (See [][Roof box].) + +### Notifications of changes to sensor data + +All sensor data changes over time, so the API must support notifying +application bundles of changes to sensor data they are interested in, +without requiring the bundle to poll for updates (see [][Petrol station finder], +[][Sightseeing application bundle], [][Changing bundle functionality when driving at speed], +[][Changing audio volume with vehicle or cabin noise], [][Night mode], +[][Odour detection]). + +Application bundles should be able to request notifications only when a +sensor value crosses a given threshold, to avoid unnecessary +notifications (see [][Changing bundle functionality when driving at speed]). + +### Uncertainty bounds + +Sensors are not perfectly accurate, and additionally a sensor’s accuracy +may vary over time; each sensor measurement should be provided with +uncertainty bounds. For example, the accuracy of geolocation by mobile +phone tower varies with your location. + +This is especially possible with data aggregated from multiple sensors, +where the aggregate accuracy can be statistically modelled (for example, +distance calculation from multiple sensors in [][Weather feedback or traffic jam feedback]). + +### Failure feedback + +As actuators are physical devices, they can fail. The API cannot assume +automatic, immediate or successful application of its changes to +properties, and needs to allow for feedback on all property changes. + +For example, the air conditioning coolant on an older vehicle might have +leaked, leaving the air conditioning system unable to cool the cabin +effectively. Application bundles which wish to set the temperature need +to have feedback from a thermometer to work out whether the temperature +has reached the target value (see [][Air conditioning control]). + +Another example is failure to close windows: [][Automatic window feedback]. + +### Timestamping + +In-vehicle networks (especially Ethernet) may have variable latency. In +order to correlate measurements from multiple sensors on the end of +connections of varying latency, each measurement should have an +associated timestamp, added at the time the measurement was recorded +(see [][Augmented reality parking], [][Sightseeing application bundle]). + +### Triggering bundle activation + +Various use cases require a bundle to be able to trigger actions based +on sensor data reaching a certain value, even if the program is not +running at that time (see [][Petrol station finder], +[][Changing audio volume with vehicle or cabin noise], [][Odour detection]). +This requires some +operating system service to monitor a list of trigger conditions even +while the programs which set those triggers are not running, and start +the appropriate program so that it can respond to that trigger. + +### Bulk recording of sensor data + +Some bundles require to be able to regularly record sensor measurements, +with the intention of processing them (for example, uploading them to an +online service) at a later time (see [][Weather feedback or traffic jam feedback], +[][Insurance bundle]). This is not latency sensitive. As an +optimisation, a system service could record the sensor readings for +them, to avoid waking up the programs regularly. + +Data recorded in this way must only be accessible to the application +bundle which requested it be recorded. + +The requesting application bundle is responsible for processing the data +periodically, and deleting it once processed. The system must be able to +periodically overwrite recorded data if running low on space. + +### Sensor security + +As highlighted by the privacy concerns in several of the use cases +([][Sightseeing application bundle], [][Changing audio volume with vehicle or cabin noise], +[][Insurance bundle]), there are security concerns with +allowing bundles access to sensor data. The system must be able to +restrict access to some or all types of sensor data unless the user has +explicitly granted a bundle access to it. Bundles with access to sensor +data must be in separate security domains to prevent privilege +escalation ([][Compromised application bundle]). + +### Actuator security + +Control of actuators is safety critical but not privacy sensitive +(unlike sensors). The system must be able to restrict write access to +some or all types of actuator unless the user has explicitly granted a +bundle access to it. Bundles with access to actuators must be in +separate security domains to prevent privilege escalation ([][Compromised application bundle]). + +### App store knowledge of device requirements + +The Apertis store must know which devices (sensors *and* actuators) an +application bundle requires to function, and should not allow the user +to install a bundle which requires a device their vehicle does not have, +or the bundle would be useless ([][Basic model vehicle]). + +### Accessing devices on multiple vehicles + +The API must support accessing properties for multiple vehicles, such as +hitched agricultural trailers ([][Agricultural vehicle]) or car trailers +([][Trailer]). These vehicles may appear dynamically while the IVI system is +running; for example, in the case where the driver hitches a trailer +with the engine running. + +**Note**: This requirement explicitly does not support C2C or C2X, which +are out of scope of this document. (See [][Car-to-car and car-to-infrastructure communications]). + +### Third-party accessories + +The API must support accessing properties of third-party accessories — +either dynamically attached to the vehicle ([][Roof box]) or installed +during manufacture ([][Truck installations]). + +### SDK hardware support + +The SDK must contain a backend for the system which allows appropriate +hardware which is attached to the developer’s machine to be used as +sensors or actuators for development and testing of applications (see +[][Development against the SDK]). + +This backend must not be available in target images. + +## Background on intra-vehicle networks + +For the purposes of informing the interface design between the Sensors +and Actuators API and the underlying intra-vehicle network, some +background information is needed on typical characteristics of +intra-vehicle networks. + +CAN and LIN are common protocols in use, though future development may +favour Ethernet or other protocols. In all cases, the OEM statically +defines all protocols, data structures, and devices which can be on the +network. Bandwidth is allocated for all devices at the time of +manufacture; even for devices which are only optionally connected to the +network, either because they’re a premium vehicle feature, or because +they are detachable, such as trailers. In these cases, data structures +on the network relating to those devices are empty when the devices are +not connected. + +Sometimes flags are used in the protocol, such as ‘is a trailer +connected?’. + +There are no common libraries for accessing vehicle networks: they +differ between OEMs. + +## Existing sensor systems + +This chapter describes the approaches taken by various existing systems +for exposing sensor information to application bundles, because it might +be useful input for Apertis’ decision making. Where available, it also +provides some details of the implementations of features that seem +particularly interesting or relevant. + +### W3C Vehicle Information Access API + +The W3C [Vehicle Information Access API] is a network-independent API +for getting and setting vehicle properties from web apps using +JavaScript. It defines a JavaScript framework (the Vehicle Information +Access API) and a standardised set of vehicle properties; the [Vehicle +Data specification]. + +The API is defined in terms of properties of the vehicle, rather than in +terms of specific sensors. For example, it exposes temperatures as +‘internal temperature’ and ‘external temperature’ rather than +enumerating and allowing access to several different thermometers. + +The Vehicle Data specification has good coverage of general vehicle +properties, but does not cover interactive use cases like parking +sensors or cameras. + +Although the specification is defined in JavaScript, its main +contribution is the standardised set of properties in the Vehicle Data +specification, which could be exposed by an API in any language. + +Extensibility is a core part of the API, although it is not especially +[rigorously defined][w3c-spec-ext]. This means that new sensor types and vehicle +properties could be added by Apertis or its OEMs and then used in +application bundles. + +The [W3C Automotive and Web Platform] Business Group is quite large +and active (126 members, last active December 2014), so this +specification stands a reasonable chance of being adopted and continuing +to be maintained. + +### **GENIVI Web API Vehicle** + +The [GENIVI Web API Vehicle] (sic) is a proof of concept API for +exposing and manipulating vehicle information to GENIVI apps via a +JavaScript API. It is very similar to the W3C Vehicle Information Access +API, and seems to expose a very similar set of properties. + +The [Web API Vehicle] is a proxy for exposing a separate Vehicle Interface +API within a HTML5 engine. The Vehicle Interface API itself is +apparently a D-Bus API for sharing vehicle information between the CAN +bus and various clients, including this Web API Vehicle and any native +apps. Unfortunately, the Vehicle Interface API seems to be unspecified +as of August 2015, at least in publicly released GENIVI documents. + +> <http://git.projects.genivi.org/?p=web-api-vehicle.git;a=blob_plain;f=doc/WebAPIforVehicleDataRI.pdf;hb=HEAD> +> Section 2.2.3 + +The Web API Vehicle has the same features and scope as the W3C API, but +its implementation is clumsier, relying a lot more on seemingly +unstructured magic strings for accessing vehicle properties. + +> <http://git.projects.genivi.org/?p=web-api-vehicle.git;a=blob_plain;f=doc/WebAPIforVehicleData.pdf;hb=HEAD> + +It was last publicly modified in May 2013, and might not be under +development any more. Furthermore, a lot of the wiki links in the +specification link to private and inaccessible data on +collab.genivi.org. + +### **Apple HomeKit** + +[Apple HomeKit] is an API to allow apps on Apple devices to interact +with sensors and actuators in a home environment, such as garage doors, +thermostats, thermometers and light switches, amongst others. It is +designed explicitly for the home environment, and does not consider +vehicles. However, as it is effectively an API for allowing interactions +between sandboxed apps and external sensors and actuators, it bears +relevance to the design of such an API for vehicles. + +At its core, HomeKit allows enumeration of devices (‘accessories’) in a +home. A large part of its API is dedicated to grouping these into homes, +rooms, service groups and zones so that collections of accessories can +be interacted with simultaneously. + +Each accessory implements one or more ‘services’ which are defined +interfaces for specific functionality, such as a light switch interface, +or a thermostat interface. Each service can expose one or more +‘characteristics’ which are readable or writeable properties of that +interface, such as whether a light is on, the current temperature +measured by a thermostat, or the target temperature for the thermostat. + +It explicitly maintains separation between *current* and *target* states +for certain characteristics, such as temperature controlled by a +thermostat, acknowledging that changes to physical systems take time. + +A second part of the API implements ‘actions’ based on sensor values, +which are arbitrary pieces of code executed when a certain condition is +met. Typically, this would be to set the value of a characteristic on +some actuator when the input from another sensor meets a given +condition. For example, switching on a group of lights when the garage +door state changes to ‘open’ as someone arrives in the garage. + +Critically, triggers and actions are handled by the iOS operating +system, so are still checked and executed when the app which created +them is not active. + +HomeKit has a [simulator] for developing apps against. + +### Apple External Accessory API + +As a precursor to HomeKit, Apple also supports an [External Accessory +API], which allows any iOS device to interact with accessories +attached to the device (for example, through Bluetooth). + +In order to use the External Accessory API, an app must list the +accessory protocols it supports in its app manifest. Each accessory +supports one or more protocols, defined by the manufacturer, which are +interfaces for aspects of the device’s functionality. They are +equivalent to the ‘services’ in the HomeKit API. The code to implement +these protocols is provided by the manufacturer, and the protocols may +be proprietary or standard. + +Each accessory exposes [versioning information][accessory-versioning] which can be used to +determine the protocol to use. + +All communication with accessories is done via [sessions][accessory-sessions], rather +than one-shot reads or writes of properties. Each session is a +bi-directional stream along which the accessory’s protocol is +transmitted. + +### iOS CarPlay + +iOS [CarPlay] is a system for connecting an iOS device to a car’s +IVI system, displaying apps from the phone on the car’s display and +allowing those apps to be controlled by the car’s touchscreen or +physical controls. It *[does not give][carplay-no-sensor]* the iOS device access to car +sensor data, and hence is not especially relevant to this design. + +It [does not][carplay-api] (as of August 2015) have an API for integrating apps with +the IVI display. + +Most vehicle manufacturers have pledged support for it in the coming +years. + +### Android Auto + +[Android Auto] is very similar to iOS CarPlay: a system for +connecting a phone to the vehicle’s IVI system so it can use the display +and touchscreen or physical controls. As with CarPlay, it does *not* +give the Android device access to vehicle sensor data, although (as of +August 2015) that is planned for the future. + +As of August 2015, it [has an API for apps][android-auto-api], allowing audio and messaging +apps to improve their integration with the IVI display. + +Most vehicle manufacturers have pledged support for it in the coming +years. + +### MirrorLink + +[MirrorLink] is a proprietary system for integrating phones with the +IVI display — it is similar to iOS CarPlay and Android Auto, but +produced by the [Car Connectivity Consortium] rather than a device +manufacturer like Apple or Google. + +The specifications for MirrorLink are proprietary and only available to +registered developers. In [their brochure][mirrorlink-brochure] (page 2), it is stated that +support for allowing apps access to sensor data is planned for the +future (as of 2014). + +MirrorLink is apparently the technology behind Microsoft’s +[Windows in the Car] system, which was announced in April 2014. + +### Android Sensor API + +[Android's Sensor API] is a mature system for accessing mobile phone +sensors. There are a more constrained set of sensors available in phones +than in vehicles, hence the API exposes individual sensors, each +implementing an interface specific to its type of sensor (for example, +accelerometer, orientation sensor or pressure sensor). The API places a +lot of emphasis on the physical limitations of each sensor, such as its +range, resolution, and uncertainty of its measurements. + +The sensors required by an app are listed in its manifest file, which +allows the Google Play store to filter apps by whether the user’s phone +has all the necessary sensors. + +As Android runs on a multitude of devices from different manufacturers, +each with different sensors, enumeration of the available sensors is +also an emphasis of the API, using its [SensorManager] class. + +[Sensors][Android-sensors] can be queried by apps, or apps can register for notifications +when sensor values change, including when the app is not in the +foreground or when the device is asleep (if supported by the +sensor). Apps can also [register][Android-sensor-register] for notifications when sensor +values satisfy some trigger, such as a ‘significant’ change. + +### Automotive Message Broker + +[Automotive Message Broker] is an Intel OTC project to broker +information from the vehicle networks to applications, exposing a +[tweaked version][broker-API] of the W3C Vehicle Information Access API (with a few +types and naming conventions tweaked) over D-Bus to apps, and +interfacing with whatever underlying networks are in use in the vehicle. +In short, it has the same goals as the Apertis Sensors and Actuators +API. + +As of August 2015, it was last modified in June 2015, so is an active +project (although Tizen is in decline, so this may change). Although it +is written in C++, it uses GNOME technologies like GObject +Introspection; but it also uses Qt. Its main daemon is the Automotive +Message Broker daemon, ambd. + +One area where it differs from the Apertis design is [][Security]; +it does not implement the polkit integration which is key to +the vehicle device daemon security domain boundary. Modifying the +security architecture of a large software project after its initial +implementation is typically hard to get right. + +Another area where ambd differs from the Apertis design is in the +backend: ambd uses multiple plugins to aggregate vehicle properties from +many places. Apertis plans to use a single OEM-provided, +vehicle-specific plugin. + +### AllJoyn + +The [AllJoyn Framework] is an internet of things (IoT) framework +produced under the Linux Foundation banner and the [AllSeen +Alliance]. (Note that IoT frameworks are explicitly out of scope +for this design; this section is for background information only. See +[][Bluetooth wrist watch and the Internet of Things]) It allows devices to discover and communicate with each +other. It is freely available (open source) and has components which run +on various different operating systems. + +As a framework, it abstracts the differences between physical +transports, providing a session API for devices to use in one-to-one or +one-to-many configurations for communication. A lot of its code is +orientated towards implementing different physical transports. + +It provides a security API for establishing different trust models +between devices. + +It provides various communication layer APIs for implementing RPC or raw +I/O streams (or other things in-between) between devices. However, it +does not specify the protocols which devices must use — they are +specified by the device manufacturer. + +AllJoyn provides common services for setting up new devices, sending +notifications between devices, and controlling devices. It provides one +example service for controlling lamps in a house, where each lamp +manufacturer implements a well-defined OEM API for their lamp, and each +application uses the lamp service API which abstracts over these. + +## Approach + +Based on the above research ([][Existing sensor systems]) and [][Requirements], we +recommend the following approach as an initial sketch of a Sensors and +Actuators API. + +### Overall architecture + + + +### Vehicle device daemon + +Implement a vehicle device daemon which aggregates all sensor data in +the vehicle, and multiplexes access to all actuators in the vehicle +(apart from specialised high bandwidth devices; see [][High bandwidth or low latency sensors]). +It will connect to whichever underlying buses are +used by the OEM to connect devices (for example, the CAN and LIN buses); +see [][Hardware and app APIs]. The implementation may be new, or may be a +modified version of ambd, although it would need large amounts of rework +to fit the Apertis design (see [][Automotive message broker]). + +The daemon needs to receive and process input within the latency bounds +of the sensors. + +The daemon should expose a D-Bus interface which follows the W3C [Vehicle +Information Access API]. The set of supported properties, out of +those defined by the [Vehicle Data specification], may vary between +vehicles — this is as expected by the specification. It may vary over +time as devices dynamically appear and disappear, which programs can +monitor using the [Availability interface]. + +The W3C specification was chosen rather than something like HomeKit due +to its close match with the requirements, its automotive background, and +the fact that it looks like an active and supported specification. +Furthermore, HomeKit requires each device to define one or more +protocols to use, allowing for arbitrary flexibility in how devices +communicate with the controller. All the sensor and actuator use cases +which are relevant to vehicles need only a property interface, however, +which supports getting and setting properties, and being notified when +they change. + +If an OEM, third party or application developer wishes to add new sensor +or actuator types, they should follow the [extension process][w3-extending] and +request that the extensions be standardised by Apertis — they will then +be released in the next version of the Sensors and Actuators API, +available for all applications to use. If a vehicle needs to be released +with those sensors or actuators in the meantime, their properties must +be added to the SDK API in an OEM-specific namespace. Applications from +the OEM can use properties from this namespace until they are +standardised in Apertis. See [][Property naming]. + +Multiple vehicles can be supported by exposing new top-level instances +of the [Vehicle interface][w3-vehicle-interface]. For example, each vehicle could be +exposed as a new object in D-Bus, each implementing the Vehicle +interface, with changes to the set of vehicles notified using an +interface like the standard [D-Bus ObjectManager] interface. + +This API can be exposed to application bundles in any binding language +supported by GObject Introspection (including JavaScript), through the +use of a client library, just as with other Apertis services. The client +library may provide more specific interfaces than the D-Bus interface — +the D-Bus API may be defined in terms of string keywords and variant +values, whereas the client library API may be sensor-specific strongly +typed interfaces. + +### Hardware and app APIs + +The vehicle device daemon will have two APIs: the D-Bus SDK API exposed +to applications, and the hardware API it consumes to provide access to +the CAN and LIN buses (for example). The SDK API is specified by +Apertis, and is standardised across all Apertis deployments in vehicles, +so that a bundle written against it will work in all vehicles (subject +to the availability of the devices whose properties it uses). + +**Open question**: The exact definition of the SDK API is yet to be +finalised. It should include support for accessing multiple properties +in a single IPC round trip, to reduce IPC overheads. + +The hardware API is also specified by Apertis, and implemented by one or +more backend services which connect to the vehicle buses and devices and +expose the information as properties understandable by the vehicle +device daemon, using the hardware API. + +At least one backend service must be provided by the vehicle OEM, and it +must expose properties from the vehicle’s standard devices from the +vehicle buses. Other backend services may be provided by the vehicle OEM +for other devices, such as optional devices for premium vehicle models; +or truck installations. Similarly, backend services may be provided by +third parties for other devices, such as after-market devices like roof +boxes. Application bundles may provide backend services as well, to +expose hardware via application-specific protocols. Consequently, +backend services will likely be developed in isolation from each other. + +Each backend service must expose zero or more properties — it is +possible for a backend to expose zero properties if the device it +targets is not currently connected, for example. + +Each backend service must run as a separate process, communicating with +the vehicle device daemon over D-Bus using the hardware API. The +hardware API needs the following functionality: + + - Bulk enumeration of vehicles + + - Bulk notification of changes to vehicle availability + + - Bulk enumeration of properties of a vehicle, including readability + and writability + + - Bulk notification of changes to property availability, readability + or writability + + - Subscription to and unsubscription from property change + notifications + + - Bulk property change notifications for subscribed properties + +The hardware API will be roughly a similar shape to the SDK API, and +hence a lot of complexity of the vehicle device daemon will be in the +vehicle-specific backends (both operate on properties — [][Properties vs devices]). + +As vehicle networks differ, the backend used in a given vehicle has to +be developed by the OEM developing that vehicle. Apertis may be able to +provide some common utility functions to help in implementing backends, +but cannot abstract all the differences between vehicles. (See +[][Background on intra-vehicle networks]). + +It is expected that the main backend service for a vehicle, provided by +that vehicle’s OEM, will be access the vehicle-specific network +implementation running in the automotive domain, and hence will use the +[inter-domain communications connection](inter-domain-communication.md). In order to avoid +additional unnecessary inter-process communication (IPC) hops, it is +suggested that the main backend service acts as *the* proxy for sensor +data on the inter-domain connection, rather than communicating with a +separate proxy in the CE domain — but only if this is possible within +the security requirements on inter-domain connection proxies. + +The path for a property to pass from a hardware sensor through to an +application is long: from the hardware sensor, to the backend service, +through the D-Bus daemon to the vehicle device daemon, then through the +D-Bus daemon again to the application. This is at least 5 IPC hops, +which could introduce non-negligible latency. See [][High bandwidth or low latency sensors] for +discussion about this. + +#### Interactions between backend services + +In order to keep the security model for the system simple, backend +services must not be able to interact. Each device must be exposed by +exactly one backend service — two backend services cannot expose the +same device; and neither can they extend devices exposed by other +backend services. + +The vehicle device daemon must aggregate the properties exposed by its +backends and choose how to merge them. For example, if one backend +service provides a ‘lights’ property as an array with one element, and +another backend service does similarly, the vehicle device daemon should +append the two and expose a ‘lights’ array with both elements in the SDK +API. + +For other properties, the vehicle device daemon should combine scalar +values. For example, if one backend service exposes a rain sensor +measurement of 4/10, and another exposes a second measurement (from a +separate sensor) of 6/10, the SDK API should expose an aggregated rain +sensor measurement of (for example) 6/10 as the maximum of the two. + +**Open question**: The exact means for aggregating each property in the +Vehicle Data specification is yet to be determined. + +#### Recommended hardware API design + +Below is a pseudo-code recommendation for the hardware API. It is not +final, but indicates the current best suggestion for the API. It has two +parts — a management API which is implemented by the vehicle device +daemon; and a property API which is implemented by each backend service +and queried by the vehicle device daemon. + +Types are given in the [D-Bus type system notation]. + +##### Management API + +Exposed on the well-known name `org.apertis.Rhosydd1` from the main daemon, +the `/org/apertis/Rhosydd1` object implements the standard +[`org.freedesktop.DBus.ObjectManager`][D-Bus ObjectManager] +interface to let client discover and get notified about the registered vehicles. + +Vehicles are mapped under `/org/apertis/Rhosydd1/${vehicle_id}` and implement +the `org.apertis.Rhosydd1.Vehicle` interface: + +--- +interface org.apertis.Rhosydd1.Vehicle { + readonly property s VehicleId; + readonly property as Zones; + method GetAttribute( + in s zone_path, + in s attribute_name, + out x current_time, + out (vdx) value, + out (uu) metadata) + method GetAttributeMetadata ( + in s zone_path, + in s attribute_name, + out x current_time, + out (uu) metadata) + method GetAllAttributes ( + in s zone_path, + out x current_time, + out a(ss(vdx)(uu)) attributes) + method GetAllAttributesMetadata ( + in s zone_path, + out x current_time, + out a(ss(uu)) attributes_metadata) + method SetAttribute ( + in s zone_path, + in s attribute_name, + in v value) + method UpdateSubscriptions ( + in a(ssa{sv}) subscriptions, + in a(ssa{sv}) unsubscriptions) + signal AttributesChanged ( + x current_time, + a(ss(vdx)(uu)) changed_attributes, + a(ss(uu)) invalidated_attributes)) + signal AttributesMetadataChanged ( + x current_time, + a(ss(uu)) changed_attributes_metadata) +} +--- + +Backends register themselves on the bus with well-known names under the +`org.apertis.Rhosydd1.Backends.` prefix and implement the same interfaces and +the main daemon, which will monitor the owned names on the bus and register +to the object manager signals to multiplex access to the backends. + +Each attribute managed via the vehicle attribute API is identified by a zone ID +(relative to the vehicle identified in VehicleId) and a property name. +Property names come from the Vehicle Data specification, for example: + + - [drivingMode.mode] + + - [lightStatus.highBeam] + + - com.myoem.fancySeatController.backTemperature (see section 8.9) + +Each attribute has three values associated: +* its value (of type v) +* its accuracy (as a standard deviation of type d, set to `0.0` for non-numeric values) +* the timestamp when it was last updated (of type x) + +In addition the current time is also returned for comparison to the time the value was last updated. + +Values also have two set of metadata (of type u) associated: +* availability enum + * AVAILABLE = 1 + * NOT_SUPPORTED = 0 + * NOT_SUPPORTED_YET = 2 + * NOT_SUPPORTED_SECURITY_POLICY = 3 + * NOT_SUPPORTED_BUSINESS_POLICY = 4 + * NOT_SUPPORTED_OTHER = 5 +* access flags + * NONE = 0 + * READABLE = (1 << 0) + * WRITABLE = (1 << 1) + + +The GetAttribute method must return the value of the given property in exactly +the given zone. If no such property exists in that zone, it must return +an error. + +In contrast, the GetAllAttributes method must return all properties in the given +zone and all zones beneath. So the same property name may be returned in +multiple entries (with a different zone ID each time). + +To receive notification of attribute changes via the AttributesChanged and +AttributesMetadataChanged signals, clients must first register their +subscription with the UpdateSubscriptions method to specify the kind of +properties for which they have some interest. + +A backend service must emit an AttributesChanged signal when one of the +properties it exposes changes, but it may wait to combine that signal +with those from other changed properties — the trade-off between latency +and notification frequency should be determined by backend service +developers. + +### Hardware API compliance testing + +As the vehicle-specific and third party backend services to the vehicle +device daemon contain a large part of the implementation of this system, +there should be a compliance test suite which all backend services must +pass before being deployed in a vehicle. + +If a backend service is provided by an application bundle, that +application bundle must additionally undergo more stringent app store +validation, potentially including a requirement for security review of +its code. See [][Checks for backend services]. + +The compliance test suite must be automated, and should include a +variety of tests to ensure that the hardware API is used correctly by +the backend service. It should be implemented as a mock D-Bus service +which mocks up the hardware management API ([][Recommended hardware API design]), and which +calls the hardware property API. The backend service +must be run against this mock service, and call its methods as normal. +The mock service should return each of the possible return values for +each method, including: + + - Success. + + - Each failure code. + + - Timeouts. + + - Values which are out of range. + +It must call property API methods with various valid and invalid input. + +The backend service must not crash or obviously misbehave (such as +consuming an unexpected amount of CPU time or memory). + +As the backend service pushes data to the vehicle device daemon, the +compliance test could be trivially passed by a backend service which +pushes zero properties to it. This must not be allowed: backend services +must be run under a test harness which triggers all of their behaviour, +for all of the devices they support. Whether this harness simulates +traffic on an underlying intra-vehicle network, or physically provides +inputs to a hardware sensor, is implementation defined. The behaviour +must be consistently reproducible for multiple compliance test runs. + +### SDK API compliance testing and simulation + +Application bundle developers will not be able to test their bundles on +real vehicles easily, so a simulator should be made available as part of +the SDK, which exposes a developer-configurable set of properties to the +bundle under test. The simulator must support all properties and +configurations supported by the real vehicle device daemon, including +multiple vehicles and third-party accessories; otherwise bundles will +likely never be tested in such configurations. Similarly, it must +support varying properties over time, simulating dynamic addition and +removal of vehicles and devices, and simulating errors in controlling +actuators (for example, [][Automatic window feedback]). + +The emulator should be implemented as a special backend service for the +vehicle device daemon, which is provided by the emulator application. +That way, it can directly feed simulated device properties into the +daemon. This backend, and the emulator should only be available on the +SDK, and must never be available on production systems. + +Compliance testing of application bundles is harder, but as a general +principle, any of the [][Apertis store validation] checks +which *can* be brought forward so they can be run by the +bundle developers, *should* be brought forward. + +### SDK hardware + +If a developer has appropriate sensors or actuators attached to their +development machine, the development version of the sensors and +actuators system should have a separate backend service which exposes +that hardware to applications for development and testing, just as if it +were real hardware in a vehicle. + +This backend service must be separate from the emulator backend service +([][SDK API compliance testing and simulation]), in order to allow them to be used independently. + +### Trip logging of sensor data + +As well as an emulator for application developers to use when testing +their applications, it would be useful to provide pre-recorded ‘trip +logs’ of sensor data for typical driving trips which an application +should be tested against. These trip logs should be replayable in order +to test applications. + +The design for this is covered in the ‘Trip logging of SDK sensor data’ +section of the Debug and Logging design. + +### Properties vs devices + +A major design decision was whether to expose individual sensors to +bundles via the SDK API, or to expose properties of the vehicle, which +may correspond to the reading from a single sensor or to the aggregate +of readings from multiple sensors. For example, if exposing sensors, the +API would expose a gyroscope plus several accelerometers, each returning +individual one-dimensional measurements. Bundles would have to process +and aggregate this data themselves — in the majority of cases, that +would lead to duplication of code (and most likely to bugs in +applications where they mis-process the data), but it would also allow +more advanced bundles access to the raw data to do interesting things +with. Conversely, if exposing properties, the vehicle device daemon +would pre-aggregate the data so that the properties exposed to bundles +are filtered and averaged acceleration values in three dimensions and +three angular dimensions. This would simplify implementation within +bundles, at the cost of preventing a small class of interesting bundles +from accessing the raw data they need. + +For the sake of keeping bundles simpler, and hence with potentially +fewer bugs, this design exposes properties rather than sensors in the +SDK API. This also means that the potentially latency sensitive +aggregation code happens in the daemon, rather than in bundles which +receive the data over D-Bus, which has variable latency. + +Similarly, the hardware API must expose properties as well, rather than +individual devices. It may aggregate data where appropriate (for +example, if it has information which is useful to the aggregation +process which it cannot pass on to the vehicle device daemon). This also +means that a set of device semantics, separate from the W3C Vehicle Data +property semantics, does not have to be defined; nor a mapping between +it and the properties. + +### Property naming + +Properties exposed in the SDK API must be named following the [Vehicle +Data specification], starting with the [Vehicle interface][w3-vehicle-interface]. +Different parts of the specification add partial interfaces which extend +the Vehicle interface. For example, fuel configuration information +should be exposed as properties starting with fuelConfiguration: + + - [fuelConfiguration.fuelType] + + - [fuelConfiguration.refuelPosition] + +Property names are formed of components (which may contain the letters +a-z, A-Z, and the digits 0-9; they must start with a letter a-z, and +must be in camelCase) separated by dots. Property names must start and +end with a component (not a dot) and contain one or more components. + +If an OEM needs to expose a custom (non-standardised) property, they +must do so beneath an OEM-specific namespace, using reverse-DNS notation +for a domain which they control. For example, for a vendor ‘My OEM’ +whose website is myoem.com, they would use properties like: + + - com.myoem.fancySeatController.backTemperature + + - com.myoem.roofRack.open + + - com.myoem.roofRack.mass + +### High bandwidth or low latency sensors + +Sensors which provide high bandwidth outputs, or whose outputs must +reach the bundle within certain latency bounds (as opposed to simply +being aggregated by the vehicle device daemon within certain latency +bounds), will be handled out of band. Instead of exposing the sensor +data via the vehicle device daemon, the address of some out of band +communications channel will be exposed. For video devices, this might be +a V4L device node; for audio devices it might be a PulseAudio device +identifier. Multiplexing access to the device is then delegated to the +out of band mechanism. + +This considerably relaxes the performance requirements on the vehicle +device daemon, and allows the more specialist high bandwidth use cases +to be handled by more specialised code designed for the purpose. + +### Timestamps and uncertainty bounds + +The W3C Vehicle Data specification does not define uncertainty fields +for any of its data types (for example, VehicleSpeed contains a single +[speed field][w3-speed], measured in metres per hour). Similarly, it does not +associate a timestamp with each measurement. However, it allows the data +types [to be extended][w3-extending-types], so the data types exposed by the vehicle +device daemon should all include an extension field specifying the +uncertainty of the measurement, in appropriate units; and another +specifying the timestamp when the measurement was taken, in monotonic +time. + +> In the CLOCK_MONOTONIC sense — <http://linux.die.net/man/3/clock_gettime> + +For example, the Apertis implementation of VehicleSpeed should be (using +the W3C notation): + +interface VehicleSpeed : VehicleCommonDataType { + +readonly attribute unsigned short speed; /\* metres per hour \*/ + +readonly attribute unsigned short uncertainty; /\* metres per hour \*/ + +readonly attribute signed int64 timestamp; + +}; + +which represents a measurement of *speed ± uncertainty* metres per hour. + +### Zones + +The W3C Vehicle Information Access API has a concept of ‘[zones]’ +which indicate the physical location of a device in the vehicle. The +current version of the specification has a misleading ZonePosition +enumerated type which is not used elsewhere in the API. The zones which +apply to a device are specified as an array of opaque strings, which may +have values other than those in ZonePosition. Multiple strings can be +used (like tags) to describe the location of a device in several +dimensions. Furthermore, zones may be nested hierarchically as discussed +in [][Recommended hardware API design]. + +Apertis may extend ZonePosition with additional strings to better +describe device locations. Strings which are not defined in this +extended enumerated type must not be used. + +Devices should be tagged with zone information which is likely to be +useful to application developers. For example, it is typically not +useful to know whether the engine is in the front or rear of the +vehicle, but is useful to know that a particular light is an interior +light, above the driver. + +**Open question**: In addition to the current entries in ZonePosition, +what other zone strings would be useful? ‘internal’ and ‘external’? + +### Registering triggers and actions + +When subscribing to notifications for changes to a particular property +using the [VehicleSignalInterface] interface, a program is also +subscribing to be woken up when that property changes, even if the +program is suspended or otherwise not in the foreground. + +Once woken up, the program can process the updated property value, and +potentially send a notification to the user. If the user interacts with +this notification, the program may be brought to the foreground. The +program must not be automatically brought to the foreground without user +interaction or it will steal the user’s focus, which is +distracting. + +> See the draft compositor security design + +Alternatively, the program could process the updated property value in +the background without notifying the user. + +The VehicleSignalInterface interface may be extended to support +notifications only when a property value is in a given range; a +degenerate case of this, where the upper and lower bounds of the range +are equal, would support notifications for property values crossing a +threshold. This would most likely be implemented by adding optional min +and max parameters to the VehicleSignalInterface.subscribe() method. + +### Bulk recording of sensor data + +This is a slightly niche use case for the moment, and can be handled by +an application bundle running an agent process which is subscribed to +the relevant properties and records them itself. This is less efficient +than having the vehicle device daemon do it, as it means more processes +waking up for changes in sensor data, but avoids questions of data +formats to use and how and when to send bulk data between the vehicle +device daemon and the application bundle’s agent. + +If the implementation of this is moved into the vehicle device daemon, +the lifecycle of recorded data must be considered: how space is +allocated for the data’s storage, when and how the application bundle is +woken to process the data, and what happens when the allocated storage +space is filled. + +### Security + +The vehicle device daemon acts as a privilege boundary between all +bundles accessing devices, between the bundles and the devices, and +between each backend service. Application bundles must request +permissions to access sensor data in their manifest (see the +Applications Design document), and must separately request permissions +to interact with actuators. The split is because being able to control +devices in the vehicle is more invasive than passively reading from +sensors — it is safety critical. A sensible security policy may be to +further split out the permissions in the manifest to require specific +permissions for certain types of sensors, such as cabin audio sensors or +parking cameras, which have the potential to be used for tracking the +user. As adding more permissions has a very low cost, the recommendation +is to err on the side of finer-grained permissions. + +The manifest should additionally separate lists of device properties +which the bundle *requires* access to from device properties which it +*may* access if they exist. This will allow the Apertis store to hide +bundles which require devices not supported by the user’s vehicle. + +From the permissions in the manifest, AppArmor and polkit rules +restricting the program’s access to the vehicle device daemon’s API can +be generated on installation of the bundle. See [][Security domains] for +rationale. + +When interacting with the vehicle device daemon, a program is securely +identified by its D-Bus connection credentials, which can be linked back +to its manifest — the vehicle device daemon can therefore check which +permissions the program’s bundle holds and accept or reject its access +request as appropriate. Therefore, the vehicle device daemon acts as +‘the underlying operating system’ in controlling access, in the +phrasing [used by][w3-sec] the W3C specification. It enforces the security +boundary between each bundle accessing devices, and between the intra- +and inter-vehicle networks. The vehicle device daemon forms a separate +security domain from any of the applications. + +Each backend service is a separate security domain, meaning that the +vehicle device daemon is in a separate security domain from the +intra-vehicle networks. + +The daemon may rate-limit API requests from each program in order to +prevent one program monopolising the daemon’s process time and +effectively causing a denial of service to other bundles by making API +calls at a high rate. This could result from badly implemented programs +which poll sensors rather than subscribing to change notifications from +them, for example; as well as malicious bundles. + +Due to its complexity, low level in the operating system, and safety +criticality, the vehicle device daemon requires careful implementation +and auditing by an experienced developer with knowledge of secure +software development at the operating system level and experience with +relevant technologies (polkit, AppArmor, D-Bus). + +The threat model under consideration is that of a malicious or +compromised bundle which can execute any of the D-Bus SDK APIs exposed +by the daemon, with full manifest privileges for sensor access. A second +threat model is that of a compromised backend service, which can execute +any of the D-Bus hardware APIs exposed by the daemon. + +#### Security domains + +There are various security technologies available in Apertis for use in +restricting access to sensors and actuators. See the Security Design for +background on them; especially §9, Protecting the driver assistance +system from attacks. These technologies can only be used on the +boundaries between security domains. In this design, each application +bundle is a single security domain (encompassing all programs in the +bundle, including agents and helper programs); the vehicle device daemon +is another domain; and each of the backend services are in a separate +domain (including the vehicle networks they each +use). + +##### Application bundle and another application bundle or the rest of the system + +Separation of the security domains of different application bundles from +each other and from the rest of the system is covered in the +Applications and Security designs. + +##### Application bundle and vehicle device daemon + +The boundary between an application bundle and the vehicle device daemon +is the Sensors and Actuators SDK API, implemented by the daemon and +exposed over D-Bus. The bundle’s AppArmor profile will grant access to +call any method on this interface if and only if the bundle requests +access to one or more devices in its manifest. Note that AppArmor is not +used to separate access to different sensors or actuators — it is not +fine-grained enough, and is limited to allowing or denying access to the +API as a whole. + +A separate set of [polkit] rules for the bundle control which +devices the bundle is allowed to access; these rules are generated from +the bundle’s manifest, looking at the specific devices listed. Given a +set of polkit actions defined by the vehicle device daemon, these rules +should permit those actions for the bundle. + +For example, the daemon could define the polkit actions: + + - org.apertis.vehicle\_device\_daemon.EnumerateVehicles: To list the + available vehicles or subscribe to notifications of changes in the + list. + + - org.apertis.vehicle\_device\_daemon.EnumerateDevices: To list the + available devices on a given vehicle (passed as the vehicle variable + on the action) or subscribe to notifications of changes in the list. + + - org.apertis.vehicle\_device\_daemon.ReadProperty: To read a + property, i.e. access a sensor, or subscribe to notifications of + changes to the property value. The vehicle ID and property names are + passed as the vehicle and property variables on the action. + + - org.apertis.vehicle\_device\_daemon.WriteProperty: To write a + property, i.e. operate an actuator. The vehicle ID, property name + and new value are passed as the vehicle, property and value + variables on the action. + +The default rules for all of these actions must be polkit.Result.NO. + +If a bundle has access to any device, it is safe and necessary to grant +it access to enumerate *all* vehicles and devices (the Enumerate\* +actions above) — otherwise the bundle cannot check for the presence of +the devices it requires. Knowledge of which devices are connected to the +vehicle should not be especially sensitive — it is expected that there +will not be a sufficient variety of devices connected to a single +vehicle to allow fingerprinting of the vehicle from the device list, for +example. + +An application bundle, org.example.AccelerateMyMirror, which requests +access to the vehicle.throttlePosition.value property (a sensor) and the +vehicle.mirror.mirrorPan property (an actuator) would therefore have the +following polkit rule generated in +/etc/polkit-1/rules.d/20-org.example.AccelerateMyMirror.rules: + +--- +polkit.addRule (function (action, subject) { + if (subject.credentials != 'org.example.AccelerateMyMirror') { + /* This rule only applies to this bundle. + * Defer to other rules to handle other bundles. */ + return polkit.Result.NOT_HANDLED; +} + +if (action.id == 'org.apertis.vehicle_device_daemon.EnumerateVehicles' || + action.id == 'org.apertis.vehicle_device_daemon.EnumerateDevices') { + /* Always allow these. */ + return polkit.Result.YES; +} + +if (action.id == 'org.apertis.vehicle_device_daemon.ReadProperty' && + action.lookup ('property') == 'vehicle.throttlePosition.value') { + /* Allow access to this specific property. */ + return polkit.Result.YES; +} + +if (action.id == 'org.apertis.vehicle_device_daemon.WriteProperty' && + action.lookup ('property') == 'vehicle.mirror.mirrorPan') { + /* Allow access to this specific property, + * with user authentication. */ + return polkit.Result.AUTH\_USER; +} + +/* Deny all other accesses. */ +return polkit.Result.NO; +--- +--- + +In the rules, the subject is always the program in the bundle which is +requesting access to the device. + +**Open question**: What is the exact security policy to implement +regarding separation of sensors and actuators? For example, bundle +access to sensors could always be permitted without prompting by +returning polkit.Result.YES for all sensor accesses; but actuator +accesses could always be prompted to the user by returning +polkit.Result.AUTH\_SELF. The choice here depends on the desired user +experience. + +##### Vehicle device daemon and a backend service + +The boundary between the vehicle device daemon and one of its backend +services is the Sensors and Actuators hardware API, implemented by the +daemon and exposed over D-Bus. The backend service’s AppArmor profile +will grant access to call any method on this interface. Note that +AppArmor is not used to grant or deny permissions to expose particular +properties — it is not fine-grained enough, and is limited to allowing +or denying access to the API as a whole. + +In order to limit the potential for a compromised backend service to +escalate its compromise into providing malicious sensor data for any +sensor on the system, each backend service must install a file which +lists the Vehicle Data properties it might possibly ever provide to the +vehicle device daemon. The vehicle device daemon must reject properties +from a backend service which are not in this list. The list must not be +modifiable by the backend service after installation (i.e. it must be +read-only, readable by the vehicle device daemon). + +Furthermore, if a backend service is found to be exploitable after being +deployed, it must be possible for the vehicle device daemon to disable +it. This is expected to typically happen with backend services provided +by application bundles, as opposed to those provided by OEMs or third +parties (as these should go through stricter review, and disabling them +would have a much larger impact). The vehicle device daemon must have a +blacklist of backend services which it never loads. It must check the +credentials of D-Bus messages from backend services against this +blacklist. + +> Using GetConnectionCredentials, which returns an unforgeable +> identifier for the peer: +> <http://dbus.freedesktop.org/doc/dbus-specification.html#bus-messages-get-connection-credentials> + +In order to support one (vulnerable) version of a backend +service being blacklisted, but not the next (fixed) version, the +blacklist must contain version numbers, which should be compared against +the installed version number of the backend service as listed in the +system-wide application bundle manifest store. + +##### Vehicle device daemon and the rest of the system + +The vehicle device daemon itself must not be able to access any of the +vehicle buses or any networks. It must be run as a unique user, which +owns the daemon’s binary, with its DAC permissions set such that other +users (except root) cannot run it. It must not have access to any device +files. See §9, Protecting the driver assistance system from attacks, of +the Security design for more +details. + +##### Backend service and another backend service or the rest of the system + +In order to guarantee it is the only program which can access a +particular vehicle bus or network, each backend service should run as a +unique user. The service’s binary must be owned by that user, with its +DAC permissions set such that other users (except root) cannot run it. +Any device files which it uses for access to the underlying vehicle +networks must be owned by that user, with their DAC permissions set such +that other users cannot access them, and udev rules in place to prevent +access by other users. If the backend needs access to a (local) network +interface to communicate with the vehicle network buses, that interface +must be put in a separate network namespace, and the CLONE\_NEWNET flag +used when spawning the backend service to put it in that namespace. This +prevents the service from accessing other network interfaces; and +prevents other processes from accessing the buses. See §9, Protecting +the driver assistance system from attacks, of the Security design for +more details. + +##### SDK emulator + +Typically, it should not be possible for one program to have access to +both the vehicle device daemon’s SDK API and its hardware API (this +access is controlled by AppArmor). However, the SDK emulator is a +special case which needs access to both — so either this must be +possible as a special case, or the SDK emulator must be split into a +backend service process and a UI process, which communicate via another +D-Bus connection. + +#### Apertis store validation + +Application bundles which request permissions to access devices must +undergo additional checks before being put on the Apertis store. This is +especially important for bundles which request access to actuators, as +those bundles are then potentially safety critical. + +##### Checks for access to sensors + +Suggested checks for bundles requesting read access to sensors: + + - The bundle does not send privacy-sensitive data to services outside + the user’s control (for example, servers not operated by the user; + see the [User Data Manifesto]), either via network transmission, + logging to local storage, or other means, without the user’s + consent. Any data sent *with* the user’s consent must only be sent + to services which follow the User Data Manifesto. For example (this + list is not exhaustive): + + - Tracking the vehicle’s movements. + + - Monitoring the user’s conversations (audio recording). + + - The bundle does not have access to uniquely identifiable + information, such as a vehicle identification number (VIN). Any + exceptions to this would need stricter review. + + - The bundle clearly indicates when it is gathering privacy-sensitive + data from sensors. For example, a ‘recording’ light displayed in the + UI when listening using a microphone. + + 1. ##### Checks for access to actuators + +Suggested checks for bundles requesting write access to actuators: + + - The bundle does not additionally have network access. + + - Actuators are only operated while the vehicle is not driving. Any + exceptions to this would need even stricter review. + + - Manual code review of the entire bundle’s source code by a developer + with security experience. The entire source code must be made + available for review by the bundle developer, as it is all run in + the same security domain. For example (this list is not exhaustive): + + - Looking for ways the bundle could potentially be exploited by an + attacker. + + - Checking that the bundle cannot use the actuator inappropriately + during normal operation if it encounters unexpected + circumstances. (For example, checking that arithmetic bugs don’t + exist which could cause an actuator to be operated at a greater + magnitude than intended by the bundle developer.) + +**Open question**: The specific set of Apertis store validation checks +for bundles which access devices is yet to be finalised. + +##### Checks for backend services + +Suggested checks for backend services for the vehicle device daemon, +whether they are provided by an OEM, a third party or as part of an +application bundle: + + - The backend service does not additionally have network access. + + - The backend service does not have write access to any of the file + system except devices it needs, and the D-Bus socket. + + - The backend service cannot access any more device nodes than it + needs to support its devices. + + - Manual code review of the entire bundle’s source code by a developer + with security experience. The entire source code must be made + available for review by the bundle developer, as it is all run in + the same security domain. For example (this list is not exhaustive): + + - Looking for ways the backend service could potentially be + exploited by an attacker. + + - Checking that the backend service cannot use any of its actuator + inappropriately during normal operation if it encounters + unexpected circumstances. (For example, checking that arithmetic + bugs don’t exist which could cause an actuator to be operated at + a greater magnitude than intended by the developer.) + + - The backend service’s D-Bus service is only accessible by the + vehicle device daemon (as enforced by AppArmor). + + - If other software is shipped in the same application bundle, it must + be considered to be part of the same security domain as the backend + service, and hence subject to the same validation checks. + + - The backend service must pass the automated compliance test ([][Hardware API compliance testing]). + + - The backend service must not expose any properties which are not + supported by the version of the vehicle device daemon which it + targets as its minimum dependency (see [][Vehicle device daemon] for information + about the extension process). + +### Suggested roadmap + +Due to the large amount of work required to write a system like this +from scratch, it is worth exploring whether it can be developed in +stages. + +The most important parts to finalise early in development are the SDK +and hardware APIs, as these need to be made available to bundle +developers and OEMs to develop bundles and the backend services. There +seems to be little scope for finalising these APIs in stages, either +(for example by releasing property access APIs first, then adding +vehicle and device enumeration), as that would result in early bundles +which are incompatible with multi-vehicle configurations. + +Similarly, it does not seem to be possible to implement one of the APIs +before the other. Due to the fragmented nature of access to vehicle +networks, the backend needs to be written by the OEM, rather than +relying on one written by Apertis for early versions of the system. + +Furthermore, the security implementation for the vehicle device daemon +must be part of the initial release, as it is safety critical. + +One area where phased development is possible is in the set of +properties itself — initial versions of the daemon and backends could +implement a small, core set of the properties defined in the [W3C Vehicle +Data specification][w3c-vehicle-data], and future versions could expand that set of +properties as time is available to implement them. As each property is a +public API, it must be supported as part of the SDK one it has appeared +in a released version of the daemon, so it is important to design the +APIs correctly the first time. + +Similarly, the scope for backend services could be expanded over time. +Initial releases of the system could allow only backend services written +by vehicle OEMs to be used; with later releases allowing third-party +backend services, then ones provided by installed application bundles. + +The emulator backend service ([][SDK API compliance testing and simulation]) +and any SDK hardware backend +services ([][SDK hardware]) should be implemented early on in development, as +they should be relatively simple, and having them allows application +developers to start writing applications against the service. + +### Requirements + + - [][Enumeration of devices]: The availability of known properties of + the vehicle can be checked through the [Availability interface]. + The W3C approach considers properties, rather than devices, to be + the enumerable items, but they are mostly equivalent (see [][Properties vs devices]). + + - [][Enumeration of vehicles]: The availability of objects + implementing the W3C Vehicle interface on D-Bus is exposed using an + interface like the D-Bus ObjectManager API. + + - [][Retrieving data from sensors]: Properties can be retrieved + through the [VehicleInterface interface][w3-vehicle-interface]. For high bandwidth + sensors, or those with latency requirements for the end-to-end + connection between sensor and bundle, data is transferred out of + band (see [][High bandwidth or low latency sensors]). + + - [][Sending data to actuators]: Properties can be set through the + [VehicleSignalInterface] interface. As with getting properties, + data for high bandwidth or low latency sensors is transferred out of + band. + + - [][Network independence]: The vehicle device daemon abstracts + access to the underlying buses, so bundles are unaware of it. + + - [][Bounded latency of processing sensor data]: The vehicle device + daemon should have its scheduling configuration set so that it can + provide latency guarantees for the underlying buses. + + - [][Extensibility for OEMs]: Extensions are standardised through + Apertis and released in the next version of the Sensors and + Actuators API for use by the OEM. + + - [][Third-party backends]: Backend services for the vehicle device + daemon can be installed as part of application bundles (either + built-in or store bundles). + + - [][Third-party backend validation]: Backend services must be + validated before being installed as bundles (see [][Checks for backend services]). + + - [][Notifications of changes to sensor data]: Property changes are + notified via a publish–subscribe interface on + [VehicleSignalInterface]. Notification thresholds are supported + by optional parameters on that interface. + + - [][Uncertainty bounds]: The W3C API is extended to include + uncertainty bounds for measurements. + + - [][Failure feedback]: Through its use of [Promises], the API + allows for failure to set a property. + + - [][Timestamping]: The W3C API is extended to include timestamps + for measurements. + + - [][Triggering bundle activation]: Programs are woken by + subscriptions to property changes (see [][Registering triggers and actions]). + + - [][Bulk recording of sensor data]: **Not currently implemented**, + but may be implemented in future as a straightforward extension to + the API. See [][Bulk recording of sensor data]. + + - [][Sensor security]: Access to the Sensors and Actuators API is + controlled by an AppArmor profile generated from permissions in the + manifest. Access to individual sensors is controlled by a polkit + rule generated from the same permissions. See [][Security]. + + - [][Actuator security]: As with [][Sensor security]; sensors and actuators are + listed and controlled by the polkit profile separately. + + - [][App-store knowledge of device requirements]: As devices + required by an application bundle are listed in the bundle’s + manifest (see [][Security]), the Apertis store knows whether the bundle + is supported by the user’s vehicle. + + - [][Accessing devices on multiple vehicles]: Each vehicle is + exposed as a separate D-Bus object, each implementing the W3C + Vehicle interface. + + - [][Third-party accessories]: Properties for third-party + accessories must be standardised through Apertis and exposed as + separate interfaces on the vehicle object on D-Bus. + + - [][SDK hardware support]: SDK hardware should be supported through + a separate development-only backend service written specifically for + that hardware. + +## Open questions + +1. [][Hardware and app APIs]: The exact definition of the SDK API is yet to be finalised. It + should include support for accessing multiple properties in a single + IPC round trip, to reduce IPC overheads. + +2. [][Interactions between backend services]: The exact means for aggregating each property in the Vehicle + Data specification is yet to be determined. + +3. [][Zones]: In addition to the current entries in ZonePosition, what other + zone strings would be useful? ‘internal’ and ‘external’? + +4. [][Security domains]: What is the exact security policy to implement regarding + separation of sensors and actuators? For example, bundle access to + sensors could always be permitted without prompting by returning + polkit.Result.YES for all sensor accesses; but actuator accesses + could always be prompted to the user by returning + polkit.Result.AUTH\_SELF. The choice here depends on the desired + user experience. + +5. [][Apertis store validation]: The specific set of Apertis store validation checks for + bundles which access devices is yet to be finalised. + +## Summary of recommendations + +As discussed in the above sections, we recommend: + + - Implementing a vehicle device daemon which exposes the W3C Vehicle + Information Access API; this will probably need to be developed from + scratch. + + - Documenting the hardware API and distributing it to OEMs, third + parties and application developers along with a compliance test + suite and a common utility library to allow them to build backend + services for accessing vehicle networks. + + - Documenting the SDK API and distributing it to application bundle + developers along with a validation suite and simulator to allow them + to build programs which use the API. + + - Provide example trip logs for journeys to test against and a method + for replaying them via the vehicle device daemon, so application + developers can test their applications. + + - Defining how to aggregate multiple values of each property in the + W3C Vehicle Data API. + + - Extending the W3C Vehicle Information Access API to expose + uncertainty and timestamp data for each property. + + - Extending the W3C Vehicle Information Access API to expose multiple + vehicles and notify of changes using an interface like D-Bus + ObjectManager. + + - Extending the W3C Vehicle Information Access API to support a range + of interest for property change notifications. + + - Extending the W3C Vehicle Information Access API to define more zone + positions for describing the physical location of devices in the + vehicle. + + - Adding a property to the application bundle manifest listing which + device properties programs in the bundle may access if they exist. + + - Adding a property to the application bundle manifest listing which + device properties programs in the bundle require access to. + + - Extending the Apertis store validation process to include relevant + checks when application bundles request permissions to access + sensors (privacy sensitive) or actuators (safety critical). Or when + application bundles request permissions to provide a vehicle device + daemon backend service (safety critical). + + - Modifying the Apertis software installer to generate AppArmor rules + to allow D-Bus calls to the vehicle device daemon if device + properties are listed in the application bundle manifest. + + - Modifying the Apertis software installer to generate polkit rules to + grant an application bundle access to specific devices listed in the + application bundle manifest. + + - Implementing and auditing strict DAC and MAC protection on the + vehicle device daemon and each of its backend services, and identity + checks on all calls between them. + + - Defining a feedback and standardisation process for OEMs to request + new properties or device types to be supported by the vehicle device + daemon’s API. + +# Sensors and Actuators API + +This sections aims to compare the current status of the Vehicle device daemon for +the sensors and actuators SDK API ([Rhosydd]) with the latest W3C specifications: +the [Vehicle Information Service Specification] API and the +[Vehicle Signal Specification] data model. + +It will also explain the required changes to align [Rhosydd] to the new W3C +specifications. + +## Rhosydd API Current State + +The current [Rhosydd API] is stable and usable implementing the +[Vehicle Information Access API] and using the data model specified by the +[Vehicle Data Specification]. + +Nevetherless, it is no longer strictly aligned with the W3C API since the W3C +working group chose to write two different specifications rather than extending the +existing one. Rhosydd would need a moderate refactoring in order to support the new +[Vehicle Information Service Specification] API and the new data model of the +[Vehicle Signal Specification]. + +## Considerations to align Rhosydd to the new VISS API + +1) The original Vehicle API and the Rhosydd API don't exactly match 1:1 as the +latter has been adapted to follow the inter-process D-Bus constraints and +best-practice, which are somewhat different than the ones for a in-process +JavaScript API. + +2) The new VISS API and its data model is largely different than the original +Vehicle API and data model upon which Rhosydd is based. + +## New vs Old Specification + +1) Rhosydd follows the [Vehicle Data specification] data model using attributes + (data) and interface objects, where VISS uses the [Vehicle Signal Specification] + data model which is based on a signal tree structure containing different + entities types (branches, rbranches, signals, attributes, and elements). + +2) The [Vehicle Information Service Specification] API objects are defined as JSON + objects that will be passed between the client and the VIS Server, where Rhosydd + is currently based on accessing attributes values using interface objects. + +3) VISS defines a set of **Request Objects** and **Response Objects** (defined as + JSON schemas), where the client must pass request messages to the server and + they should be any of the defined request objects, in the same way, the message + returned by the server must be one of the defined response objects. + +4) The request and response parameters contain a number of attributes, among them + the Action attribute which specify the type of action requested by the client + or delivered by the server. + +5) VISS lists well defined actions for client requests: authorize, getMetadata, + get, set, subscribe, subscription, unsubscribe, unsubscribeAll. + +6) The [Vehicle Signal Specification] introduces the concept of **signals**. They + are just named entities with a producer (or publisher) that can change its + value over time and have a type and optionally a unit type defined. + +7) The [Vehicle Signal Specification] data model introduces a signal specification + format. This specification is a YAML list in a single file called **vspec** + file. This file can also be generated in other formats (JSON, FrancaIDL), and + basically defines the signal and data structure tree. + +8) The Vehicle Signal Specification introduces the concept of signal ID databases. + These are generated from the vspec files, and they basically map signal names to + ID's that can be used for easy indexing of signals without the need of providing + the entire qualified signal name. + +## Rhosydd Required Changes + +- The [Vehicle Information Service Specification] API defines the Request and + Response Objects using a JSON schema format. The [Rhosydd API] (both the + application-facing and backend-facing ones) need to be updated to provide a + similar API based on idiomatic DBus methods and types. + +- Map the different VISS Server actions to handle client requests to their + respective DBus methods in Rhosydd. + +- The internal Rhosydd data model needs to be updated to support all the element + types defined in the [Vehicle Signal Specification]. + +- It might also be required to add support to process signal ID databases in order + for Rhosydd to recognize signals specified by the Vehicle Signal Specification. + +## Advantages + +- The new VISS spec is based on a WebSocket API, and it resembles more closely the + inter-process mechanism based on D-Bus in Rhosydd rather than the previous + JavaScript in-process mechanism defined by the previous specification. + +## Conclusion + +The main effort will be about updating the internal Rhosydd data model to reflect +the changes introduced in the [Vehicle Signal Specification] data model, with the +extended types and metadata. + +The DBus APIs, both on the application and backend sides, will need to be updated +to map to the new data model. From a high-level point of view the old and new APIs +are relatively similar, but a non-trivial amount of changes is expected to map the +new concepts and to align to the new terminology. + +The [Rhosydd] client APIs for applications (librhosydd) and backends (libcroesor) +will need to be updated to reflect the changes in the underlying DBus APIs. + +## Appendix: W3C API + +For the purposes of completeness, the [W3C Vehicle Information Access +API] is reproduced below. This is the version from the Final +Business Group Report 24 November 2014, and does not include the [Vehicle +Data specification][w3-vehicle-data] for brevity. The API is described as +[WebIDL], and partial interfaces have been merged. + +--- +partial interface Navigator { + readonly attribute Vehicle *vehicle*; +}; + +[NoInterfaceObject] + +interface Vehicle { + /* Extended with properties by the Vehicle Data specification. */ +}; + +enum ZonePosition { + "front", + "middle", + "right", + "left", + "rear", + "center" +}; + +interface Zone { + attribute DOMString[] value; + readonly attribute Zone driver; + boolean equals (Zone zone); + boolean contains (Zone zone); +}; + +callback VehicleInterfaceCallback = void(object value); (); +callback AvailableCallback = void (Availability available) (); + +enum VehicleError { + "permission_denied", + "invalid_operation", + "timeout", + "invalid_zone", + "unknown" +}; + +[NoInterfaceObject] + +interface VehicleInterfaceError { + readonly attribute VehicleError error; + readonly attribute DOMString message; +}; + +interface VehicleInterface { + Promise get (optional Zone zone); + readonly attribute Zone[] zones; + Availability availableForRetrieval (DOMString attributeName); + readonly attribute boolean supported; + short availabilityChangedListener (AvailableCallback callback); + void removeAvailabilityChangedListener (short handle); + Promise getHistory (Date begin, Date end, optional Zone zone); + readonly attribute boolean isLogged; + readonly attribute Date ? from; + readonly attribute Date ? to; +}; + +[NoInterfaceObject] + +interface VehicleConfigurationInterface : VehicleInterface { +}; + +[NoInterfaceObject] + +interface VehicleSignalInterface : VehicleInterface { + Promise set (object value, optional Zone zone); + unsigned short subscribe (VehicleInterfaceCallback callback, optional Zone zone); + void unsubscribe (unsigned short handle); + Availability availableForSubscription (DOMString attributeName); + Availability availableForSetting (DOMString attributeName); +}; + +enum Availability { + "available", + "not_supported", + "not_supported_yet", + "not_supported_security_policy", + "not_supported_business_policy", + "not_supported_other" +}; +--- + +[Rhosydd]: https://docs.apertis.org/rhosydd/index.html + +[Rhosydd API]: https://docs.apertis.org/rhosydd/index.html + +[Vehicle Information Service Specification]: https://www.w3.org/TR/vehicle-information-service/ + +[Vehicle Signal Specification]: https://github.com/GENIVI/vehicle_signal_specification + +[Vehicle Information Access API]: http://www.w3.org/2014/automotive/vehicle_spec.html + +[Vehicle Data specification]: http://www.w3.org/2014/automotive/data_spec.html + +[w3c-spec-ext]: http://www.w3.org/2014/automotive/data_spec.html#Extending + +[W3C Automotive and Web Platform]: https://www.w3.org/community/autowebplatform/ + +[W3C Automotive and Web Platform]: http://projects.genivi.org/web-api-vehicle/home + +[Web API Vehicle]: http://git.projects.genivi.org/?p=web-api-vehicle.git;a=blob_plain;f=doc/WebAPIforVehicleData.pdf;hb=HEAD + +[Apple HomeKit]: https://developer.apple.com/homekit/ + +[homekit-simulator]: https://developer.apple.com/library/ios/documentation/NetworkingInternet/Conceptual/HomeKitDeveloperGuide/TestingYourHomeKitApp/TestingYourHomeKitApp.html#//apple_ref/doc/uid/TP40015050-CH7-SW1 + +[External Accessory API]: https://developer.apple.com/library/ios/featuredarticles/ExternalAccessoryPT/Introduction/Introduction.html + +[accessory-versioning]: https://developer.apple.com/library/ios/documentation/ExternalAccessory/Reference/EAAccessory_class/index.html#//apple_ref/occ/instp/EAAccessory/modelNumber + +[accessory-sessions]: https://developer.apple.com/library/ios/documentation/ExternalAccessory/Reference/EASession_class/index.html#//apple_ref/occ/instp/EASession/accessory + +[CarPlay]: http://www.apple.com/uk/ios/carplay/ + +[carplay-no-sensor]: http://www.tomsguide.com/us/apple-carplay-faq,news-18450.html + +[carplay-api]: https://developer.apple.com/carplay/ + +[Android Auto]: https://www.android.com/auto/ + +[android-auto-api]: https://developer.android.com/training/auto/index.html + +[MirrorLink]: http://www.mirrorlink.com/apps + +[Car Connectivity Consortium]: http://carconnectivity.org/ + +[mirrorlink-brochure]: http://carconnectivity.org/public/files/files/MirrorLink_2pgBrochure_0.pdf + +[Windows in the Car]: http://www.techradar.com/news/car-tech/microsoft-sets-its-sights-on-apple-carplay-with-windows-in-the-car-concept-1240245 + +[Android's Sensor API]: http://developer.android.com/guide/topics/sensors/index.html + +[SensorManager]: http://developer.android.com/reference/android/hardware/SensorManager.html + +[Android-sensors]: http://developer.android.com/reference/android/hardware/SensorManager.html#registerListener%28android.hardware.SensorEventListener,%20android.hardware.Sensor,%20int%29 + +[Android-sensor-register]: http://developer.android.com/reference/android/hardware/SensorManager.html#requestTriggerSensor%28android.hardware.TriggerEventListener,%20android.hardware.Sensor%29 + +[Automotive Message Broker]: https://github.com/otcshare/automotive-message-broker + +[broker-API]: https://github.com/otcshare/automotive-message-broker/blob/master/docs/amb.in.fidl + +[AllJoyn Framework]: https://allseenalliance.org/framework + +[Allseen Alliance]: https://allseenalliance.org/ + +[Availability interface]: http://www.w3.org/2014/automotive/vehicle_spec.html#data-availability + +[w3-extending]: http://www.w3.org/2014/automotive/data_spec.html#Extending + +[w3-vehicle-interface]: http://www.w3.org/2014/automotive/vehicle_spec.html#vehicle-interface + +[D-Bus ObjectManager]: http://dbus.freedesktop.org/doc/dbus-specification.html#standard-interfaces-objectmanager + +[D-Bus type system notation]: http://dbus.freedesktop.org/doc/dbus-specification.html#type-system + +[org.freedesktop.DBus.Properties]: http://dbus.freedesktop.org/doc/dbus-specification.html#standard-interfaces-properties + +[drivingMode.mode]: https://www.w3.org/2014/automotive/data_spec.html#idl-def-DrivingMode + +[lightStatus.highBeam]: https://www.w3.org/2014/automotive/data_spec.html#idl-def-LightStatus + +[fuelConfiguration.fuelType]: https://www.w3.org/2014/automotive/data_spec.html#idl-def-FuelConfiguration + +[fuelConfiguration.refuelPosition]: https://www.w3.org/2014/automotive/data_spec.html#idl-def-FuelConfiguration + +[w3-speed]: http://www.w3.org/2014/automotive/data_spec.html#vehiclespeed-interface + +[w3-extending-types]: http://www.w3.org/2014/automotive/data_spec.html#Extending%20Existing%20Data%20Types + +[w3-zones]: http://www.w3.org/2014/automotive/vehicle_spec.html#zone-interface + +[VehicleSignalInterface]: http://www.w3.org/2014/automotive/vehicle_spec.html#widl-VehicleSignalInterface-subscribe-unsigned-short-VehicleInterfaceCallback-callback-Zone-zone + +[w3-sec]: http://www.w3.org/2014/automotive/vehicle_spec.html#security + +[polkit]: http://www.freedesktop.org/software/polkit/docs/master/polkit.8.html + +[User Data Manifesto]: https://userdatamanifesto.org/ + +[w3-vehicle-data]: http://www.w3.org/2014/automotive/data_spec.html + +[Promises]: http://www.w3.org/TR/2013/WD-dom-20131107/#promises + +[WebIDL]: http://www.w3.org/TR/WebIDL/ diff --git a/content/designs/software-development-kit.md b/content/designs/software-development-kit.md new file mode 100644 index 0000000000000000000000000000000000000000..92ad464359146f581bbe88ff68d8a53bfdff09d0 --- /dev/null +++ b/content/designs/software-development-kit.md @@ -0,0 +1,462 @@ +--- +title: SDK +short-description: Software Development Kit purpose and design + (partially-implemented, no available app validation tool, usability can be improved) +authors: + - name: Travis Reitter +--- + +# Software Development Kit + +## Definitions + +- **Application Binary Interface (ABI) Stability**: the library + guarantees API stability and further guarantees dependent + applications and libraries will not require any changes to + successfully link against any future release. The library may + add new public symbols freely. + +- **Application Programming Interface (API) Stability**: the + library guarantees to not remove or change any public symbols in + a way that would require dependent applications or libraries to + change their source code to successfully compile and link + against later releases of the library. The library may add new + public symbols freely. Later releases of the API-stable library + may include ABI breaks which require dependent applications or + libraries to be recompiled to successfully link against the + library. Compare to **ABI Stability**. + +- **Backwards compatibility**: the guarantee that a library will + not change in a way that will require existing dependent + applications or libraries to change their source code to run + against future releases of the library. This is a more general + term than ABI or API stability, so it does not necessarily imply + ABI stability. + +- **Disruptive release**: a release in which backwards + compatibility is broken. Note that this term is unique to this + project. In some development contexts, the term “major release†+ is used instead. However, that term is ambiguous in general. + +## Software Development Kit (SDK) Purpose + +The primary purpose of the special SDK system image will be to enable +Apertis application and third-party library development. It will include +development tools and documentation to make this process as simple as +possible for developers. And a significant part of this will be the +ability to run the SDK within the VirtualBox PC emulator. VirtualBox +runs on ordinary x86 hardware which tends to make development much +simpler than a process which requires building and running +in-development software directly on the target hardware which will be of +significantly lower performance relative to developer computers. + +## API/ABI Stability Guarantees + +Collabora will carry along open source software components' API and ABI +stability guarantees into the Apertis Reference SDK API. In most cases, +this will be a guarantee of complete API and ABI stability for all +future releases with the same major version. Because these portions of +Apertis will not be upgraded to later disruptive releases, these +portions will maintain API and ABI stability at least for each major +release of Apertis. + +The platform software included in the Reference system images will be in +the form of regular Debian packages and never in the form of +application-level packages, which are described in the “Apertis +Supported API†document. Collabora will manage API/ABI stability of the +platform libraries and prevent conflicts between libraries at this +level. + +See the “Apertis Supported API†document for more details of specific +components' stability guarantees and the software management of +platform, core application, and third-party application software. + +## Reference System Image Composition + +See the document “Apertis Build and Integrationâ€, section “Reference +System Image Compositionâ€. + +## System Image Software Licenses + +See the document “Apertis Build and Integration†for details on license +checking and compliance of software contained in the system images. + +## Development Workflow + +### Typical Workflow + +Most developers working on specific libraries or applications will not +be strictly dependent upon the exact performance characteristics of the +device hardware. And even those who are performance-dependent may wish +to work within the SDK when they aren't strictly tuning performance, as +it will yield a much shorter development cycle. + +For these most-common use cases, a typical workflow will look like: + +1. modify source code in Eclipse + +2. build (for x86) + +3. smoke-test within the Target Simulator + +4. return to step 1. if necessary + +In order to test this code on the actual device, the code will need to +be cross-compiled (see the document “Apertis Build and Integration +Designâ€, section “App cross-compilationâ€). To do so, the developer would +follow the steps above with: + +1. run [][Install to target] Eclipse plugin + +2. test package contents on device + +3. return to step 1. if necessary + +The development workflow for the Reference and derived images themselves +will be much more low-level and are outside the scope of this document. + +### On-device Workflow + +Some work, particularly performance tuning and graphics-intense +application development, will require testing on a target device. The +workflow [above][Typical workflow] handles this use case, but developing on a target device +can save the time of copying files from a development machine to the +device. + +This workflow will instead look like: + +1. modify source code as needed + +2. run [][Install to target] Eclipse plugin + +3. test package contents on device + +4. if debugging is necessary, either + + 1. run [][Remote app debugging] Eclipse plugin; or + + 2. open secure shell (ssh) connection to target device for + multi-process or otherwise-complex debugging scenarios + +5. return to step 2. if necessary + +### Workflow-simplifying Plugins + +Some of the workflow steps [above][Typical worflow] will be simplified by streamlining +repetitive tasks and automating as much as possible. + +#### Install to Target + +This Eclipse plugin will automatically: + +1. build the cross-architecture Apertis app bundle + +2. copy generated ARM package to target + +3. Install package + +It will use a sysroot staging directory (as described in the document +“Apertis Build and Integration Designâ€, section “App +cross-compilationâ€) to build the app bundle and SSH to copy and +remotely and install it on the target. + +App bundle signature validation will be disabled in the Debugging and +SDK images, so the security system will not interfere with executing +in-development apps. + +#### Remote App Debugging + +This Eclipse plugin will connect to a target device over SSH and, using +information from the project manifest file, execute the application +within GDB. The user will be able to run GDB commands as with local +programs and will be able to interact with the application on the device +hardware itself. + +This plugin will be specifically created for single application +debugging. Developers of multi-process services will need to connect to +the device manually to configure GDB and other tools appropriately, as +it would be infeasible to support a wide variety of complex setups in a +single plugin. + +#### Sysroot Updater + +This Eclipse plugin will check for a newer sysroot archive. If found, +the newer archive will be downloaded and installed such that it can be +used by the [][Install to target] plugin. + +## 3D acceleration within VirtualBox + +Apertis will depend heavily on the Clutter library for animations in its +toolkit and for custom animations within applications themselves. +Clutter requires a working 3D graphics stack in order to function. +Without direct hardware support, this requires a software OpenGL driver, +which is historically very slow. Our proposed SDK runtime environment, +VirtualBox, offers experimental 3D hardware “pass-through†to achieve +adequate performance. However, at the time of this writing, this support +is unreliable and works only on very limited host hardware/software +combinations. + +We propose resolving this issue with the new “llvmpipe†software OpenGL +driver for the Mesa OpenGL implementation. This is the +community-supported solution to replace the current, +significantly-slower, “swrast†software driver. Both the upcoming +versions of Fedora and Ubuntu Linux distributions will rely upon the +“llvmpipe†driver as a fallback in the case of missing hardware +support. The latest development version of Ubuntu 12.04, which Collabora +is developing our Reference system images against, already defaults to +“llvmpipeâ€. Additionally, the “llvmpipe†driver implements more +portions of the OpenGL standard (which Clutter relies upon) than the +“swrast†driver. + +In initial testing with an animated Clutter/Clutter-GTK application, +llvmpipe performance was more than adequate for development purposes. In +a VirtualBox guest with 2 CPU cores and 3 GiB of RAM, demo applications +using the Roller widget displayed approximately 20-30 frames per second +and had very good interactivity with the llvmpipe driver. In comparison, +the same program running with the swrast driver averaged 4 frames per +second and had very poor interactivity. + +While this approach will not perform as well as a hardware-supported +implementation, and will vary depending on host machine specifications, +it will be the most reliable option for a wide variety of VirtualBox +host operating system, configuration, and hardware combinations. + +## Simulating Multi-touch in VirtualBox + +Because Apertis will support multi-touch events and most VirtualBox +hosts will only have single pointing devices, the system will need a way +to simulate multi-touch events in software. Even with adequate hardware +on the host system, VirtualBox does not support multiple cursors, so the +simulating software must be fully-contained within the system images +themselves. + +### Software-based solution + +We propose a software-based solution for generating multi-touch events +within the SDK. This will require a few new, small components, outlined +below. + +In the intended usage, the user would use the [][Multi-touch gesture generator] +to perform a gesture over an application running in the Target +Simulator as if interacting with the hardware display(s) in an Apertis +system. The Gesture Generator will then issue commands through its +uinput device and the [][Uinput Gesture Device Xorg Driver] will use those +commands to generate native X11 multi-touch events. Applications running +within the Target Simulator will then interpret those multi-touch events +as necessary (likely through special events in the Apertis application +toolkit). + +#### Multi-touch Gesture Generator + +This will be a very simple user interface with a few widgets for each +type of gesture to generate. The developer will click on a button in the +generator to start a gesture, then perform a click-drag anywhere within +VirtualBox to trigger a set of multi-touch events. The generator will +draw simple graphics on the screen to indicate the type and magnitude of +the gesture as the developer drags the mouse cursor. + + + +We anticipate the need for two gestures commonly used in popular +multi-touch user interfaces: + + - **Pinch/zoom**: the movement of a thumb and forefinger toward + (zoom-out) or away (zoom-in) from each other. This gesture has a + magnitude and position. The position allows, e.g., a map application + to zoom in on the position being pinched rather than requiring a + separate zoom into the center of the viewable area, then a drag of + the map. + + - Zoom-in: simulated by initiating the pinch/zoom gesture from the + Gesture Generator, then click-dragging up-right. The distance + dragged will determine the magnitude of the zoom. + + - Zoom-out: the same process as for zoom-in, but in the opposite + direction + + - **Rotate**: the movement of two points around an imaginary center + point. Can be performed either in a clockwise or counter-clockwise + direction. This gesture has a magnitude and position. The position + allows, e.g., a photo in a gallery app to be rotated independent of + the other photos. + + - Clockwise: simulated by initiating the rotate gesture, then + click-dragging to the right. This can be imagined as drag + affecting the top of a wheel. + + - Counter-clockwise: the same process as for clockwise, but in the + opposite direction. + +Additional gestures could be added during the specification process, if +necessary. + + + +Upon the user completing the simulated gesture, the Gesture Generator +would issue a small number of key or movement events through a special +uinput device (which we will refer to as the Uinput Gesture Device). +Uinput is a kernel feature which allows “userland†software (any +software which runs directly or indirectly on top of the kernel) to +issue character device actions, such as key presses, releases, +two-dimensional movement events, and so on. This uinput device will be +interpreted by the [][Uinput Gesture Device Xorg Driver]. + +#### Uinput Gesture Device Xorg Driver + +This component will interpret the input stream from our Uinput Gesture +Device and generate X11 multi-touch events. These events would, in turn, +be handled by the windows lying under the events. + +#### X11 Multi-touch Event Handling + +Windows belonging to applications running within the Target Simulator +will need to handle multi-touch events as they would single-touch +events, key presses, and so on. This would require to add support for +multi-touch events in the Apertis application toolkit for applications +to simply handle multi-touch events the same as single-touch events. + +### Hardware-based solution + +An alternative to the software-based solution [above][Software-based solution] would be to use a +hardware multi-touch pad on the host machine. This is a simpler solution +requiring less original development though it brings a risk of Windows +driver issues which would be outside of our control. Because of this, we +recommend Collabora perform further research before finalizing upon this +solution if this is preferred over the [][Software-based solution]. + +The touch pad hardware would need to be well-supported in Linux but not +necessarily the host operating system (including Windows) because +VirtualBox supports USB pass-through. This means that output from the +touch pad would simply be copied from the host operating system into +VirtualBox, where Xorg would generate multi-touch events for us. + +The best-supported multi-touch device for Linux is Apple's Magic +Trackpad. This device uses a Bluetooth connection. Many Bluetooth +receivers act as USB devices, allowing pass-through to VirtualBox. In +case a host machine does not have a built-in Bluetooth receiver or has a +Bluetooth receiver but does not route Bluetooth data through USB, an +inexpensive Bluetooth-to-USB adapter could be used. + +Collabora has verified that multi-touch gestures on an Apple Magic +Trackpad plugged into a Linux host can be properly interpreted within +Debian running within VirtualBox. This suggests that a hardware-based +solution is entirely feasible. + +#### Hardware Sourcing Risks + +Collabora investigated risks associated with selecting a single hardware +provider for this multi-touch solution. The known risks at this point +include: + +1. Apple has a history of discontinuing product lines with little + warning + +2. As of this writing, there appear to be few alternative multi-touch + pointing devices which are relatively inexpensive and support + arbitrary multi-touch movements + +In the worst case scenario, Apple could discontinue the Magic Trackpad +or introduce a new version which does not (immediately) work as expected +within Linux. With no immediate drop-in replacement for the Magic +Trackpad, there would not be a replacement to recommend internally and +to third-party developers using the Apertis SDK. + +However, there are several mitigating factors that should make these +minor risks: + +1. Inventory for existing Magic Trackpads would not disappear + immediately upon discontinuation of the product + +2. Discontinuation of a stand-alone multi-touch trackpad entirely is + very unlikely due to Apple's increasingly-strong integration of + multi-touch gestures within iOS and Mac OS itself. + +3. In case Apple replaces the Magic Trackpad with a Linux-incompatible + version, there is significant interest within the Linux community to + fix existing drivers to support the new version in a timely manner. + For instance, Canonical multi-touch developers use the Magic + Trackpad for their development and will share Apertis's sourcing + concerns as well. + +4. As an ultimate fallback, [][Multi-touch gesture generator] can be + recommended as an alternative source of multi-touch input. + +## Third-party Application Validation Tools + +### Two-step Application Validation Process + +The third-party application process will contain two main validation +steps which mirror the application submission processes for Android and +iOS apps. The first, SDK-side validation checks will be performed by a +tool described below. Developers may perform SDK-side validation as +often as they like before submitting their application for approval. +This is meant to automatically catch as many errors in an application as +soon as possible to meet quality requirements for application review. + +The second step of the application validation process is to validate +that an application meets the app store quality requirements. It is +recommended to set up a process where new applications automatically get +run through this same Eclipse plugin as an initial step in review. This +will guarantee applications meet the latest automated validation checks +(which may not have been met within the developer's SDK if their Eclipse +plugin were old). Developers will be able to easily stay up-to-date with +the validation tool by applying system package updates within the SDK, +so this difference can be minimized by a small amount of effort on the +developer's part. Applications which pass this initial check will then +continue to a manual evaluation process. + +### App Validation Tool + +To streamline the third-party application submission process (which will +be detailed in another document), Collabora will provide an Eclipse +plugin to perform a number of + +SDK-side validation checks up on the application in development. +Collabora proposed checks are: + + - **Application contains valid developer signing key** – developers + must create a certificate to sign their application releases so + verifying the source of application updates can be guaranteed. This + check will ensure that the certificate configured for the + application meets basic requirements on expiration date and other + criteria. + + - **Manifest file is valid** – the application manifest file, which + will be used in the software management of third-party applications + on the platform, must meet a number of basic requirements including + a developer name, application categories, permissions, minimum SDK + API, and more. + + - **Application builds from cleaned source tree** – this step will + delete files in the source tree which are neither included in the + project nor belong to the version control system and perform a full + release build for the ARMHF architecture. Build warnings will be + permitted but build errors will fail this check. + + - **AppArmor profile is valid** – the application's AppArmor profile + definition must not contain invalid syntax or conflict with the + Apertis global AppArmor configuration + +Third-party application validation will be specified in depth in another +document. + +## General approach to third-party applications + +In most cases, third-party applications should not need to explicitly +validate their access to specific system resources, delegating as much +as possible to the SDK API or to other parts of the system. Preferably, +these applications will specify system resource requirements in their +manifest, such as permissions the application needs to function, network +requirements, and so on. The main advantages of having these in the +manifest file are using shared code to perform some of the actual +run-time resource requests. + +Note that this strategy implies a trade-off between how simple it is to +write an application and how complex the supporting SDK and system +components need to be to provide that simplicity. That is to say, it +often makes sense to impose complexity onto applications, in particular +when it's expected that only a few will have a given requirement or use +case. This general approach should be kept in mind while designing the +SDK and any other interfaces the system has with third-party +applications and their manifests. + diff --git a/content/designs/software-distribution-and-updates.md b/content/designs/software-distribution-and-updates.md new file mode 100644 index 0000000000000000000000000000000000000000..0b3146d9166216deafba8ba40b05ab2385e462ab --- /dev/null +++ b/content/designs/software-distribution-and-updates.md @@ -0,0 +1,577 @@ +--- +title: Software distribution and updates +short-description: Concepts, requirements, and examples of reliable software + distribution and update systems. +authors: + - name: Peter Senna Tschudin + - name: Emanuele Aina +--- + +# Software distribution and updates + +## Introduction + +Apertis is a mature platform that is compatible with modern and flexible +solutions for software distribution and software update. This document describes +user-driven and operator-driven use cases, explores the challenges of each use +case to extract requirements, and finally propose building blocks for software +distribution and software update. + +## Terminology + +### Application and services + +Application and services are loosely defined terms that indicate single +functional entities from the perspective of end users. However each application +may be composed of more than one component: + +* [system services](https://wiki.apertis.org/Glossary#system-service) +* [user services](https://wiki.apertis.org/Glossary#user-service) +* [graphical programs](https://wiki.apertis.org/Glossary#graphical-program) + +From the perspectives of software updates and software distribution +applications and services can be deployed as part of the base operating system +or separately as +[bundles](https://wiki.apertis.org/Glossary#application-bundle). + +### Base operating system + +The base operating system is the core component of the software stack. It +includes the kernel, and basic userspace tools and libraries such as process +manager, connectivity services, and update manager. Additional components like +an application manager may be part of the base OS, depending on the intended +usage. + +### Bundles + +A bundle or "application bundle" refers to a unit that represent all the +components of an Application or service. Comparing to mobile phones a bundle is +similar to a phone "app", and we would say that an Android .apk file contains a +bundle. Some systems refer to this concept as a package, but that term is +strongly associated with dpkg/apt (.deb) packages in Debian-derived systems, +and it only partially captures the concept of a bundle. + +The granularity is usually different between packages and bundles. +Installing an application using packages is likely to involve multiple +packages, while the bundle approach in our context goes in the direction of +a single monolithic bundle that contains all components of an application. +A bundle, unlike a package, offers atomic updates, rollback, insulation +from the base operating system, insulation from other applications and +configurable run time permissions for user data and system resources. + +Docker images, Flatpak bundles, and Snaps are all examples of application bundles. + +### Software distribution + +Software distribution is the process of delivering software to users and +devices. It usually refers to the distribution of binaries of software to be +installed or updated. However software distribution is more than a transport +layer for packages an it can include authorization, inventory, and deployment +management. + +### Software updates + +The most common goals of an update are fixing bugs, removing security +vulnerabilities, and adding new features to already installed software. Updating +a software component may also involve updating the chain of dependencies of that +software component. + +## Operator-driven use cases + +The operator is an entity with the responsibility of ensuring that the devices +operate within pre-defined specifications. A device can have more than one +operator such as the manufacturer and the owner of the devices, and the +operators, and not the device user, have powers to install, remove and update +software on the devices. + +### Building access control devices + +Access control is used to restrict access to a particular place, building, +room, or resource. To gain access an individual generally needs to be given +permission to enter by someone who already has authorization. + +Automated building access rely on control devices to authenticate identity and +to control physical locks. These devices use a variety of authentication methods +such as smart cards, biometric data, and passwords, and can control access +devices such as doors, gates and turnstile. + +For most use cases, building access control devices are only the interface for +more complex systems that include secure networks and servers. Building access +control devices collect authentication data and send it to a server. The server +then decides if the physical access should be granted, and send commands back to +the device for informing the user and for controlling the lock. + +Building access control devices have a critical mission. Failing to grant +access to authorized personnel or granting access to unauthorized personnel +can have serious consequences that can go beyond financial losses. Mission +critical devices have strict reliability and security requirements, which +include protection against tampering, resilience to user operation, and +resilience to minor failures on the devices. + +Both the manufacturer and owner may operate a large fleet of building access +control devices. Large fleets are vulnerable to unintended changes on the +software stack as it can introduce reliability and security issues. Low severity +variability issues can be solved remotely, but high severity issues require +manual intervention on each affected device. + +Another problem for large fleet of devices is software deployment. Updates and +new features should be deployed to devices on the field with minimal risk of +rendering devices unusable. Operators require information about the software +stack(installed software, version, etc) of each device to make decisions about +how and when to do software deployment. + +Device manufacturers offer on-demand development services. A new feature is +developed for a customer and then is deployed only to the devices of that +specific customer. Delivering the custom features requires conditional +deployment capabilities based on business rules such as device owner and service +level. + +### Robotic lawn mower + +A robotic lawn mower is an electric autonomous robot that cuts lawn grass in a +pre-determined area. Common features of robotic lawn mowers include finding the +recharging base automatically, avoiding obstacles, and using advanced algorithms +to cover the working area efficiently. + +High-end robotic lawn mower are connected to the cloud to allow the owner to +configure and control the unit using a convenient web interface. The owner, +acting as the operator, uses a website to configure the schedule and settings +of the mower such as the cutting height. Some models also allow the operator to +remote control the mower. + +Connected robotic lawn mowers receive over-the-air updates that are installed +when the mower is not in use, respecting the schedule that was configured by the +operator. + +## User-driven use cases + +There are two categories of user-driven use cases. The first one is built on +top of operator-driven use cases. In this category the device allow users to +install and remove optional applications, but keeps the operator in control of +system updates and system applications. In the second category the device is +left under full control of the user, without any operator involvement. + +### Infotainment system + +An infotainment system is usually an interface between users and a vehicle +showing information about the vehicle and allowing the user to configure +options such as interior lights and air conditioning. An infotainment system +also provides additional functionality such as navigation, connectivity with +the user's phone, music, Internet browser, and allows the user to install and +remove applications. + +An infotainment system can offer a personalized set of features for different +models of vehicles and for different users. Premium features and applications +are only available for owners of premium models of the vehicle and for users +willing to pay for them. + +The life cycle of an infotainment system can go beyond a decade, creating a +challenging scenario for support and maintenance of the software stack. The vast +majority of software components used in an infotainment system have a much +smaller release cycle, with more than one release per year being common. + +Releases are important for software components because only the latest releases +receive security and bug fixes. Failing to keep the software stack using fairly +recent components results in an infotainment system with bugs and security +vulnerabilities. + +On the other hand, as users interact with infotainment systems while driving, +these devices are heavily regulated. The device requires an expensive +certification process before deployment, and software updates are also subject +to certification. So while updates are important for bug and security fixes, the +structure and costs of certification of changes makes pressure against too +frequent updates. + +Another important actor in the infotainment ecosystem is the application +developer. Empowering the application developer results in greater availability +of applications and in faster availability of updates. Having more applications +is a competitive advantage for the infotainment system, as users may prefer the +infotainment system that has more installable applications. + +Application developers need to be able to target as many different infotainment +products as possible without being tied to the release cycle of each specific +product. In other words, it is important for the developer to be as close as +possible to have a single application that runs without changes in different +infotainment systems and in different releases of infotainment systems. + +This is particularly challenging as the very long lifecycle of infotainment +products means that there are significant differences in the kind and versions +of components shipped as part of the base operating system of different +products. As such an application developer should be capable of releasing and +updating applications independently from the base operating system, and should +be able to conveniently create bundles that are optimized for a modern +development flow. + +The physical deployment characteristics of infotainment systems also complicate +maintenance and updates. An unrecoverable failure due to an over-the-air update +may force vehicle owners to pay a visit to the closest service center making +customers unhappy, and potentially causing significant financial loss when the +problem affects tens of thousands of vehicles. + +And finally resilience to user operation is also a challenge to infotainment +systems. Users should not be able to render the device inoperative, or make the +device to operate outside its design specifications by continuous use, by +changing configurations, or by installing/uninstalling applications. + +### Power and measuring tools + +Power tools are electrically driven tools such as drills and grinders, with most +models being powered by batteries. Measuring tools are electronic devices for +measuring, or helping the user to measure, physical properties of the +environment. Examples of measuring tools are wall scanners, thermo cameras, and +laser measures. + +Connected power and measuring tools can receive over-the-air updates and offer a +convenient interface for the user to adjust operating parameters and to see the +device status. The user can choose between a web interface and a mobile phone +application to interact with power and measuring tools. + +## Non-use-cases + * Product development: during product development developers need to privilege + flexibility over robustness. However robustness is of primary importance in + production environment, and as such flexibility to ease development is not a + use case. + + * Workstations: while the mechanism described here are valuable on + workstations as well, they are not the focus of this document. + +## Requirements + +### Conditional software deployment based on business rules + +It should be possible to restrict the selection of software components that +users and operators can install, remove and update based on business rules such +as payment, customer, service level, and market segment. + +It should also be possible for the operator to configure the deployment to +adhere to business rules such as available time slots for maintenance, and to +split complex deployments in batches. + +### Configurable access rights to user data and system resources + +Applications should have limited and configurable access to system resources and +user data. For example, applications should not be capable of taking screen +shots, and the music player should have access to only specific files and +folders. + +### Consistent state across devices + +Maintaining a large fleet of devices requires the software stack of each device +to be in a known state. Devices in unknown state are challenging to maintain and +may present reliability and security issues. + +### Independent release and update of application domains + +It should be possible to release and update application domains independently +from the base operating system. + +### Operator-driven software distribution and updates + +On operator-driven use cases, the operator should be capable of controlling the +software distribution and update of large fleets of devices. + +### Protecting the fleet from software deployment issues + +There should be mechanisms in place to prevent software distribution and +software update issues, such as an update that renders the devices unusable, to +affect the entire fleet of devices. + +### Resilience to distribution and update failures + +Minor problems such as an update failure due to download problem caused by a +network issue on the device side should not render the device inoperative and +should recover automatically without intervention. + +### Resilience to user operation + +User actions including installing and removing optional applications should not +render the device inoperative, or make the device to operate outside its design +specifications. + +### Software inventory + +Operators require software inventory information such as installed software, and +software version to make decisions about how and when to do software deployment. +As an example when a security vulnerability is discovered, having an overview +of how many devices are affected is important to determine the severity of the +vulnerability, and to plan a response. + +### Tampering protection + +Mission critical devices and devices subject to regulation require protection +against unauthorized modification. Users should not be allowed to modify the +devices to operate outside its design specifications. + +### Unwanted changes to the software stack + +A common method of attacking a device consists in changing software that is +installed or installing malicious components. Preventing unwanted changes +on the software stack, and preventing non-authorized software to be installed +eliminates an important attack vector: attacks that require changes to the +software stack. + +### Updates rollback + +Software updates should be reversible, and allow to rollback to a previous +working state. This requirement applies to system software and applications. + +### User-driven software distribution + +The user should be capable of installing and removing software components on +user-driven use cases. + +## High level features + +Before describing existing solutions it is necessary to group the requirements +in features that are implemented by these solutions. One requirement may be +related to more than one feature such as the requirement *Consistent state +across devices* being related to the features *Immutable software stack* and +*Atomic updates*. + +### Immutable software stack + +* Related requirements: *Consistent state across devices*, *Resilience to user + operation*, *Tampering protection*, *Unwanted changes to the software stack* + +One solution to address these requirements is to make the base operating system +and the application domains immutable. + +### Atomic updates + +* Related requirements: *Consistent state across devices*, *Protecting the fleet + from software deployment issues*, *Resilience to distribution and update + failures*, *Updates rollback* + +Updates on traditional package-based Linux distributions are prone to errors. An +update usually involves multiple packages, and each package update can fail in +ways that are not trivial to automatically recover from. After a failure on a +package-based update, the limited rollback functionality is not guaranteed to +revert the problem, leading to manual intervention. + +A robust approach for updates that are capable of reliable rollbacks is called +atomic updates. Atomic updates perform the file operations in a staging area, +and the changes are only committed if the update is successful. When a failure +occurs during an update, the changes are not committed and do not affect the +file system. + +However the benefits of reliable rollbacks are limited to changes made to the +filesystem. Changes that are not file operations, such as updating the +bootloader are not guaranteed to rollback gracefully. + +### Separation between system and application domains + + * Related requirements: *Conditional software deployment based on business + rules*, *Configurable access rights to user data and system resource*, + *Consistent state across devices*, *Independent release and update of + application domains*, *Resilience to distribution and update failures*, + *User-driven software deployment* + +These requirements are related to separating the base operating system from +application domains in regards to software distribution, software updates, and +execution environment. + +Separating base operating system from application domains allow product teams to +develop their products with greater independence, and offers more flexibility on +how application domains are deployed, updated and executed. + +### Deployment management + + * Related requirements: *Conditional software deployment based on business + rules*, *Consistent state across devices*, *Operator-driven software + distribution and updates*, *Protecting the fleet from software deployment + issues*, *Resilience to distribution and update failures*, *Software inventory* + +Software distribution is more than a transport layer for packages, it includes +authorization, inventory, and deployment management. The software distribution +infrastructure for traditional tools such as `apt-get` basically consists of +static content providers that were designed to replace the previous method based +on CDs and DVDs. + +This infrastructure works well for transporting packages over the network, but +it lacks features to implement business rules such as customer, payment, and +hardware profile. On large fleets of operator-driven use cases, the operator +need control over the deployment of updates and new features. It is +responsibility of the operator to run the deployment in conformity to business +rules to for example schedule a reboot in an appropriate moment, and to divide +the deployment in batches. + +The main component of a deployment management solutions is usually the backend +infrastructure that interfaces with agents running one the devices. A common +goal to deployment management is to offer easy and flexible rollout of software +with monitoring of progress which is essential for large fleets. + +## Existing systems + +### OSTree for base operating system + +OSTree implements for the base operating system *Immutable software stack* and +*Atomic updates*. It also offers the underlying framework to allow *Separation +between system and application domains*. + +OSTree is a feature-rich deployment and update mechanism for files and +directories in Linux. It offers transactional upgrades and rollback, is capable +of replicating content incrementally over HTTP, support multiple parallel +bootable root filesystems, and have flexible support for multiple branches and +repositories. + +As mentioned earlier, rolling out updates using package management tools such as +apt-get is prone to a high degree of variability. Each update involves multiple +packages, and each package update can fail on file operations and on scripts. +Current package management systems have only limited roll back capability(See +[apt-btrfs-snapshot](https://github.com/skorokithakis/apt-btrfs-snapshot)) +meaning that a failure during a package update can leave the system in an +unknown state making it challenging to secure and maintain. + +Failures during an OSTree atomic update are not committed, meaning that a failed +update have no effect on the running system. If an OSTree atomic update +completes successfully but introduces software issues, rolling back to the +previous working version is guaranteed to work. + +However OSTree does not directly address the needs of application domains. For +software distribution and update of application domains we recommend using +either Flatpak or Docker. + +### Flatpak and Docker for applications + +Both Flatpak and Docker implement for applications *Immutable software stack*, +*Atomic updates*, and *Separation between system and application domains*. One +requirement that is also addressed by both is *Configurable access rights to +user data and system resource*. + +Both Flatpak and Docker are mature and feature rich solutions for application +distribution and update. They offer decoupling from the system, give the +application developer greater freedom, give the user greater control, and run +applications insulated from the system and from other applications. These are +advantages when compared to more conventional packaging and distribution +systems such as dpkg and apt-get. + +Flatpak purposely focuses on user-level applications and services, or in +applications with a GUI, such as the ones to be used on a infotainment system. +Flatpak applications are shipped in bundles named Flatpaks, and it uses +libostree under the hood to provide OSTree efficiency and robustness to +application management. + +Docker is instead better suited for non-graphical applications. Docker ships +containers, and it is a good solution for applications that are developed and +deployed as a collection of loosely coupled services. In some cases some sort of +container orchestration is used with Docker, but orchestration is a topic that +goes beyond the scope of this document. + +Flatpak and Docker can fulfill similar roles for decoupling applications from +the base OS, and there are use cases for both in Apertis. A case-by-case +evaluation needs to be done to find the most suitable mechanism for each +application and service. As examples, for the infotainment system use case +Flatpak is better suited for the applications the user can install and remove. +For the building access control devices Docker is a better fit for headless +applications that collect identity data and controls locking mechanisms. + +### Eclipse hawkBit + +Eclipse hawkBit implements *Deployment management*. + +Eclipse hawkBit is a back-end framework for deployment management of edge +devices. It can manage both the base OS and applications, and it is relatively +agnostic about the kind of applications used. A preliminary investigation of the +feasibility of the integration of the hawkBit-based Bosch Software Innovations +IoT management suite with Apertis has been done with positive outcome. + +### Microsoft Azure IoT Edge + +Eclipse hawkBit implements *Deployment management*. + +Microsoft Azure IoT Edge is a full hosted suite to manage the deployment of +Docker containers on edge devices and it also offer deployment management +capabilities. + +A preliminary Apertis image with support for Docker containers has been +evaluated to explore the feasibility of using Apertis with Microsoft Azure IoT +Edge. + +## Appstore + +An appstore should meet the requirements: *Conditional software deployment +based on business rules*, *Independent release and update of application +domains*, *Protecting the fleet from software deployment issues*, *Software +inventory*, *User-driven software deployment*. It should also provide support +to the high level feature *Deployment management* or integrate with an external +*Deployment management* solution. + +An appstore is the interface that allow users to browse, buy, install, remove, +and update applications on their devices. Users interact with an appstore +remotely over a web frontend, and locally over an application on the device. + +The appstore sits at the highest level layer of software distribution and update +and reflects the decisions made for the lower layers. For example the solution +for bundles and for deployment management highly impact the appstore design. + +As an interface with the user the appstore verifies user credentials, presents +the software catalog, and processes payment. As an interface with the +deployment management layer the appstore queries the software inventory, and +issue software distribution commands such as install an application on the user +device. + +Unlike an user, the operator is responsible for the health of a fleet of +devices, and an appstore may not be part of the use case. Instead the operator +uses an interface to change device configuration and to control deployment of +updates and new features. + +### Curridge + +Curridge is a custom non-upstream solution based on the Magento web commerce +framework. At the moment Curridge has only been part of demonstrations done by +the RBEI team, but Apertis currently ships a component to interface with it +named Frome. + +Collabora is not aware of the current feature set, but we expect that it is +possible to adapt Curridge to ship Flatpak bundles. However more information is +needed to compare the feature set with the requirements of an appstore. + +An alternative path is to extend Curridge to interface with external solutions +such as Flathub and hawkBit. This interfacing could allow Curridge to focus on +the appstore user, and offload other tasks such as deployment management and +bundle compatibility to dedicated components. + +### Flathub + +Flathub is the upstream appstore for applications distributed via +Flatpak. + +It provides a validated workflow for third-party application authors to +[publish their +work](https://github.com/flathub/flathub/wiki/App-Submission#how-to-submit-an-app). + +Applications can be browsed on FlatHub itself or through the on-device +applications for app management, such as GNOME Software or KDE Discover. + +Flathub does not support payments at the moment, even though there's upstream +interest in the feature. It does not provide any remote management solution. + +## Summary of recommendations + + * Use OSTree for the base operating system for *Immutable software stack*, + *Atomic updates*, and *Updates rollback*. + * Use Flatpak or Docker for applications for *Immutable software stack*, + *Atomic updates*, *Separation between system and application domains*, and + *Configurable access rights to user data and system resource*. + * Use Flathub and Docker registry for storage and content delivery systems + * For operator-driven management, provide integration with hawkBit and + Microsoft Azure IoT Edge + * Open point: should Apertis provide a default hawkBit instance for testing + and guidance for product teams? + * Evaluate the effort to extend Curridge to interface with Flathub and + hawkBit. + * Open point: Should Curridge handle deployment management or offload it to + other solution such as hawkBit? + * For user-driven application management, use Flathub on the back-end, and + either adapt GNOME Software or write a custom GUI application on top of + Flatpak for the on-device user interface + * Open point: Should Curridge be adapted to interface with Flathub? + +## Reference: System updates and rollback + +The [System updates and +rollback](https://designs.apertis.org/latest/system-updates-and-rollback.html) +document contains details about technologies that are currently being used for +software distribution and software update such as OSTree. Consider reading +`System updates and rollback` after having read this document. + diff --git a/content/designs/supported-api.md b/content/designs/supported-api.md new file mode 100644 index 0000000000000000000000000000000000000000..572c63b6d7c4f88960cc89cbdbaabe7ade3f2973 --- /dev/null +++ b/content/designs/supported-api.md @@ -0,0 +1,668 @@ +--- +title: Supported API +short-description: API and API stability challenges and solutions + (implemented) +authors: + - name: Gustavo Noronha +--- + +# Supported API + +## Introduction + +The goal of this document is to explain the relevant issues around API +(Application Programming Interface) and ABI (Application Binary +Interface) stability and to make explicit the APIs and ABIs that can be +and will be guaranteed to be available in the platform for application +development. + +It will be explained as well how we are going to deal with situations +where certain components break their API/ABI. + +## New releases and API stability + +Software systems are typically composed of several components with some +depending on others. Components need to make assumptions about how their +dependencies behave, in order to use them. These assumptions are +categorized in API and ABI depending on whether they are resolved at +build time or at runtime, respectively. As components evolve over time +and their behavior changes, so may their API and ABI. + +In systems composed of thousands of components, each time a component +changes, potentially hundreds of other components could break. Fixing +each of those components could cause other breaks in turn. Without a way +to manage those changes, assembling and maintaining non-trivial systems +wouldn't be a practical enterprise. + +To manage this complexity, components which are to be depended upon by +others set an API/ABI stability policy. This policy states under which +circumstances new releases can be expected to break API or ABI. This +allows the system integrator to update to newer releases of components +with some assurance that other components won't break as a result. These +guarantees also allow new releases of components to simply depend upon +the last "known-good" release of each of their dependencies instead of +requiring them to be constantly tested against newer dependencies. + +Most components will keep stable branches in which API - and often ABI - +are not allowed to break, and normally only bug fixes and minor features +will be merged into these branches. It is generally recommended that +components (particularly, stable ones) depend only on stable branches of +their dependencies. Releases in a stable branch are referred to as +"backwards compatible" because components that depend upon a given +release will continue to work with later releases in that same branch. + +By libraries keeping API stability in stable branches and by libraries +and applications depending on stable versions of libraries, breaks are +greatly reduced to manageable levels. + +An API can consist of multiple parts: for a typical C library, the API +will be the C function and type declarations, plus the +gobject-introspection (GIR) description of the API. Similarly, an ABI +can consist of multiple parts: the C function and type declarations, +plus the D-Bus API for a system service, for example. + +The GIR API is especially relevant for further development of Apertis, +as it is planned to allow apps to be written in non-C languages such as +JavaScript. In this situation, API stability requires both the C +declarations to be stable, plus the conversion of those declarations to +a GIR file to be stable — so it is affected by changes in the +implementation of the GIR scanner (the g-ir-scanner utility provided by +gobject-introspection). This is covered further in [][ABI is not just library symbols]. + +## API and ABI stability strategies + +There is a tension between keeping the development environment stable +and keeping up with novelties. Following is an investigation about how +various mobile platforms have tackled this issue that hopefully provides +enough information for a practical strategic decision on how to handle +that tension. + +### The Android approach + +Android makes a promise of forward-compatibility for the main Android +APIs. Although Android has been built on top of Linux and using a Java +virtual machine, no APIs of these platforms are considered to be part of +the Android platform. + +Instead of reusing existing components and libraries Google decided to +write almost everything from scratch, including a C library, a graphics +subsystem, audio, web and multimedia subsystems and APIs. + +This approach has the big disadvantage of not reusing and sharing much +of the work done by the open source community in similar projects, which +means a significant investment and hundreds of thousands of hours of +engineering time spent building and maintaining everything. On the plus +side, those APIs and the underlying components they are built upon are +fully controlled by Google, and submit to whatever requirements the +Android platform has, giving Google full control regarding tilting the +balance in favour of stability or break-through as it sees fit. + +Although Google has been very successful in keeping its API/ABI +stability promises, it has made incompatible changes in almost every +release. From API level 13 to 14 (in other words, from Android 3.2 to +4.0) alone there were a few dozen API deprecations and *[removals][Android-api-removals]*, +including methods, class and interface fields, and so on. Each new +version brings in its release notes a report of API differences compared +to the last version. In addition to these, underlying component changes +have caused applications to misbehave and crash when assuming a certain +behaviour that got changed. + +### The iOS approach + +Apple has been known for wanting to control every bit of the products +they make. From hardware all the way to third-party application design, +Apple tends to influence or enforce its own rules. The iOS is no +exception: instead of reusing existing open source APIs, Apple designed +and built their own components and APIs from the ground up. The same +disadvantages Android's approach has are also present here: instead of +sharing the cost of building all of the basic tools with lots of +developers world wide, Apple decided to build everything itself, making +a significant investment in terms of money and engineering time. + +The main difference between Android and iOS, though, are that Apple did +not have to start from scratch: they had Mac OS X already, and were able +to reuse some of the work they have done previously, although that +itself brings a disadvantage: the need to balance the needs of the +desktop use case and the mobile use case in a single code base. The +advantages, though, are the same: Apple is fully in control of the +system from the ground up, and can make decisions on tilting the balance +between stability and break-through. + +Apple, like Google, has also been successful keeping compatibility, but +has had its set of incompatible changes in every release. The [API +changes between iOS 4.3 to 5][iOS-api-changes], for instance, has a couple tens of +*removed or renamed* classes, fields and methods. + +### The ApERTIS/OpenSource approach + +Open source projects like GNOME have been very successful at providing +balance to the tension by having API/ABI stability promises, but as the +need for technology overhauls appeared, keeping backwards compatibility +has often proven very costly, and a choice to break compatibility and +refresh the platform has been made. + +That was the case, for instance, with the recently released GNOME3. The +GNOME project had to some extent maintained compatibility with +applications that were written all the way back in 2002, and had +accumulated a considerable amount of deprecated functionality and APIs +that burdened the project, slowing down progress and requiring a lot of +maintenance work. Those had to be left behind the project in order to +bring it up-to-date with the expectations of the current decade. + +The big advantage of using open source components is most of the hard +work of building all of the pieces of infrastructure and even some +applications has been made, leaving hardware integration, application +development, customization, specific features and QA as the main +required work before going to market, instead of having a much larger +team that would build everything from scratch, or licensing a +proprietary components. + +The main disadvantage to this approach is that the decision on how to +tilt the balance between stability and freshness is not under the full +control of the company building the product: some decisions will be made +by the projects that build the various components that make up the +solution that can increase the cost of keeping stability while still +maintaining freshness. + +For instance: Google has full control of Android's underlying graphics +stack, Surface Flinger, and is able to ensure its compatibility moving +forward; it is also able to make APIs deal transparently with changes in +this underlying layer. The same goes for Apple and its iOS. When it +comes to the open source graphics stack, a move from the current Xorg +infrastructure to the next-generation Wayland will break some of the +underlying assumptions made by applications. + +Some of the core libraries that are parts of the graphics stack are also +likely to change, taking advantage of the API stability break imposed by +the move to a new graphics infrastructure to also perform some changes +to their core and APIs. Some projects may also decide to break their +stability promises from time to time for technology overhauls, like +GNOME did with GNOME 3. We will investigate some theoretical and real +world cases in order to get a more concrete example of how these +overhauls may present themselves, and how they can be handled. + +There are several options when dealing with backwards-incompatible +novelties: delaying the integration of a new release, for instance, is +the best way to guarantee stability, but that will only delay the impact +of the changes. Building a set of APIs that abstract some of the +platform can also be sensible: applications using high level widgets can +be shielded from changes done at the lower levels – Clutter, Mx, and so +--- + +To conclude: taking advantage of open source code takes away some of the +control over the platform's future. While Google and Apple are able to +decide exactly what happens to the components that make up Android and +iOS in the future, someone basing their product on an open source +platform doesn't. It's important to notice that that is also the case +for companies building products based on Android, and maybe even more +so: when Google decided that Android Honeycomb would not be released, +many companies were left without the latest version of Android to base +their products on. + +Also, like GNOME, Windows and Mac OS have started afresh at some point +in time, to be able to bring their products to the next level, it is +very likely there will come a time in which iOS and Android will go +through a similar major change on their foundations, and companies +basing their products on Android will have to decide how to handle the +upgrade, when it happens. + +### The role of limiting the supported API surface + +While the API and ABI promises made by Android and iOS have been largely +successful, it is important to note that they do not cover everything an +application may need. Core services like graphics and networking are +covered, but more specific functionality is not. One example is JSON +processing. JSON is one of the most widely used formats for exchanging +data between apps and servers. + +There are no APIs at all for this format in iOS. Applications that need +to use JSON need to either roll their own implementation or embed a JSON +processing library into their application. The same goes for APIs to +access Youtube and other Google services through its GData +protocol. + +> See < http://www.appdevmag.com/10-ios-libraries-to-make-your-life-easier/> +> for more examples of missing APIs and replacements that can be embedded + +Android has similar limitations. Android devices are not guaranteed to +have APIs for Google services, and although add-ons exist to bolt on +those APIs, they cannot be redistributed, in some cases. For services +that use GData, there is also an add-on library that can be embedded in +the application, but there are no API/ABI guarantees. + +Imposing those limits on which APIs are guaranteed to not change (or +change as little as possible in reality) makes it possible for Android +and iOS to lower the maintenance costs for the platform, while making it +possible to embed libraries into applications allows applications to not +be completely limited by the available standard APIs. Note also that +embedded libraries can only be used by the application embedding it, +avoiding inter-application dependencies. That is one of the reasons +Collabora is suggesting that a set of libraries be specified to be +handled as +supported. + +### How would incompatible changes impact the product and how to handle them? + +This section aims at investigating some cases where a line was drawn and +old APIs were left behind, and how products based on or simply shipping +those APIs handled it. The recent arrival of GNOME 3 in early 2011 drew +the line and allowed for the clean up of APIs that were almost 10 years +old, with few or no forward compatibility breakages through that period. +It provides a lot of insights at how to handle that kind of structural +overhaul. + +#### The GTK+ upgrade and a Clutter API break + +GTK+ is the main toolkit used by the GNOME system. The upgrade to GTK+ +3.0 was very smooth, for such a big upgrade. Applications required +changes, but not all applications needed to be ported at once, since +everything that made up the library changed name, making it installable +in parallel with GTK+ 2. This means simple applications written using +the toolkit still work, even if you have GTK+ 3-based applications +installed and working. So that is exactly how distributors handled the +situation: both libraries are installed as long as there are +applications that need the old one. + +A very similar situation would surface if Clutter and Mx happened to +break their API and ABI promises: applications that aren't updated to +use the new APIs and ABIs would simply continue using the older Clutter +and Mx libraries. An additional burden would appear for the teams +designing higher level widgets, though: the widgets would have to be +supported for both library versions, and care would need to be taken to +not have an application link to the old Clutter/Mx and with the higher +level widgets built with the new ones. + +There are several facilities to make this possible available in the +debian packaging tools used by the base distribution Apertis is built +on, and also in the development tools used by those libraries. Provided +they are used correctly this specific case should not prove too +difficult. Most distributions that handled this kind of breakage spent a +lot of time tuning dependencies and other package relationships, and +making sure no interfaces other than the binary ones were in +disagreement, though. Some of the Collabora developers who are +participating in the Apertis project are responsible for a significant +part of the work that has been done to make the transition smooth in +Debian. Their experience with it is that it is a very time consuming +process, with many corner cases and subtleties to be taken care of, and +even then several trade-offs had to be made. + +#### When a core library breaks + +Some applications are a bit special: most browser plugins, for instance, +relied on the browser being written in GTK+ 2 – since that is what +Firefox uses on Linux/X11. That is not a problem for a browser built in +Qt, or Clutter, for instance, since they can look for the system GTK+ 2 +library, open it and use its symbols to perform the initialization some +plugins expect. It is a problem, though, for browsers written in GTK+ 3: +as soon as the plugin is loaded there will be symbols from both GTK+ 3 +and GTK+ 2 in the symbol resolution table, and that will lead to subtle +and hard to debug bugs, and to crashes. That is one of the reasons why +Firefox has decided to not move to GTK+ 3. + +The same happens with GStreamer plugins. If a library is used by both a +GStreamer plugin and an application, and that library changes the same +problem described for browser plugins would happen. That would be the +case if, for instance, an application uses clutter-gst – since the +application and the clutter-gst video sink both link to Clutter, they +would need to be linked to the same version of the library to work +properly. + +Plugins are not the only case in which such problems happen. If a core +library like glib breaks compatibility similar issues will appear for +all of the platform. Almost every application links to glib and so do +many libraries, including core ones like Clutter. If a new version of +glib is released which breaks ABI, all of these would have to be +migrated to the new library at once, otherwise symbol clashes like the +ones described above would happen. In GNOME 3 glib has not broken +compatibility, but it is expected to break it at some point in the +(medium term) future. + +As discussed in the previous section, ensuring forward compatibility +after such a break in the ABI of glib would only be possible with a very +significant effort, and might prove to not be viable. Collabora would +recommend that turning points like this be treated as major upgrades to +the platform, requiring applications to be reworked. Such upgrades can +be delayed by a few releases to allow enough time for the applications +to be updated, though. + +#### When a “leaf†library breaks ABI + +When a core library such as glib breaks, the impact will be felt +throughout the platform, but when a library that is used only by a few +components breaks there is more room for adjustment. It's unlikely that +both libraries and applications would link to libstartup-notification, +for instance. In such cases the new version of the library can be +shipped along with the old one, and the old one can be maintained for as +long as necessary. + +#### ABI is not just library symbols + +A leaf library may end up causing more issues, though, if it breaks. +GNOME 3 has provided us with an example of that: the GNOME keyring is +GNOME's password storage. It's made up of a daemon (that among other +things provides a D-Bus service), and a client library for applications +to use. GNOME keyring has undergone a change in the protocol, and both +the library and the daemon were updated. The library was parallel +installable with the old one, but the new daemon completely replaced the +old one. + +But the old client library and the new daemon did not know how to talk +to each other, so even though applications would not crash because of a +missing library or missing symbols, they were not able to store or +obtain passwords from the keyring. That is also what would happen in +case a D-Bus service changes its interface. + +In case something like this happens it is possible to work around the +issue by adding code to the daemon to keep supporting the old +protocol/interface, but this increases the maintenance burden and the +cost/benefits ratio needs to be properly assessed, since it may be +significant. + +Similarly, the GIR interface for a library forms part of its public API. +The GIR interface is a high-level, language-agnostic API which maps +directly to the C API, and can be used by multiple language bindings to +automatically allow the library to be used from those languages. Its +stability depends on the stability of the underlying C library, plus the +stability of the GIR generation, implemented by g-ir-scanner. + +#### The move to Wayland + +Moving to Wayland is a fairly big change, but the impact on application +compatibility may not be that big. If applications are using only +standard Clutter and Mx APIs (or higher level APIs built on top of them) +they would just work. If the application relies on something related to +X, though, and uses any of the Clutter X11 functions, then that will +require that they be ported. + +That is a good reason for making those APIs part of the unsupported set, +and if necessary provide APIs as part of the higher level toolkit to +accommodate application needs. Wayland will allow an X server to be run +and paint to one of its windows, so extreme cases could be handled by +using that feature, but relying on it may prove unwise. + +#### The GTK+ and Clutter merger + +There has been discussion among GNOME developers recently about merging +Clutter and GTK+ into a single toolkit. GTK+ is a powerful toolkit with +many years of experience built in, and solving many of the problems +posed by complex UIs, but it lacks the eye candy and some of the +features people now expect in a modern toolkit. Clutter on the other +hand has all of the eye candy and features one expects from a modern +toolkit, but lacks the toolkit part. While Mx and St, the GNOME Shell's +toolkit, do provide some widgets and higher level features, they are not +nearly as fully featured and mature as GTK+. The existence of so many +toolkits is being seen as fragmentation of the developer story in the +GNOME platform, which also plays a role in these discussions. + +When the merger of Clutter and GTK+ happens , the impact and solutions +would be pretty much the same as if Clutter and Mx break ABI. Old +libraries and applications using Clutter and Mx would remain working, +but care would have to be exercised in making sure no process ends up +using the two versions at the same time. It would also lead the project +to making a decision on whether to rebase the higher level widgets on +the new GTK+ 4 (as the merged library is called in discussions) or not. + +According to the maintainers, Mx is still in use by Intel in some of +their applications and will be used for the netbook UI in Tizen, so its +medium-term future appears to be fairly certain at this point. + +## API Support levels + +A number of API support levels has been indicated recognizing that some +bits of the platform are more prone to change than others, and given the +strategy of building higher level custom APIs. The custom and enabling +APIs make up what is often called the SDK APIs. They are the ones with +better promises, and for which Collabora will try to provide smooth +upgrade paths when changes come about, while the APIs on the lower +levels will not get as much work, and application developers will be +made aware that using them means the app might need to be updated for a +platform upgrade. + +The overall strategy being considered right now to assign APIs to each +of these support levels is to start with the minimum set of libraries +required to run the Apertis system being part of the image with all +libraries assigned to the Internal APIs support level, and gradually +promote them as development progresses and decisions are made. The +following sections describe the support levels. + + + +### Custom APIs + +The Custom APIs are high level APIs built on top of the middleware +provided by Collabora. These APIs do not expose objects, types or data +from the underlying libraries, thus providing easier and abstract ways +of working with the system. + +Examples of such APIs are the share functionality, and a number of UI +components that have been designed and built for the platform. Collabora +has had only limited information about these components, so an +assessment of how effectively they shield store applications from lower +support level libraries is currently not possible. + +For these components to deliver on their promise of abstracting the +lower level APIs it is imperative that they expose no objects, data +types, functions and so on from other libraries to the application +developer. Collabora will be ready to assist on defining and refining +the Custom APIs to cover basic needs for applications. + +### Enabling APIs + +These APIs are not guaranteed to be stable between platform upgrades, +but work may be done on a case-by-case basis to provide a smooth +migration path, with old versions coexisting with newer ones when +possible. Most existing open source APIs related to core functionality +fall in this support level: Mx, clutter, clutter-gst, GStreamer, and so +--- + +As discussed in section 3.5.1, [][The GTK upgrade and a Clutter API break], +there are ways to deal with ABI/API breakage in these libraries. Keeping +both versions installed for a while is one of them. In the short term +there will be at least one set of API changes that will have a big +impact on the Apertis project: [Clutter 2.0]. That new version of +clutter is one of the steps in preparation for a future merge of GTK+ +and Clutter. + +It is possible that this new version of Clutter is released while the +Apertis project is still not far enough in development that a switch can +be made. However, in case that is not possible, a plan will need to be +laid out to properly migrate to this new version in a future release. +Being based on Clutter, the main SDK APIs that relate to UI will need to +be ported, of course. Components that are based on Clutter such as +clutter-gst will need to be updated too. Illustration Illustration shows +how an application process could end up in this situation. + +This would lead to the kind of problems discussed in [][When a core library breaks] +for applications that use clutter both directly +and indirectly through another library that uses clutter under the hood, +for instance. An application that uses both SDK UI APIs and an earlier +version of clutter would have to be updated. An application which relies +solely on Clutter would still work fine by just having the old version +of clutter around. The same would apply to an application which relies +solely on the SDK UI APIs, of course. + + + +### OS APIs + +The OS APIs include low level libraries such as glib and its siblings +gio, gdbus, as well as system services such as PulseAudio, glibc and the +kernel. Applications reaching down to these components would, as is the +case for enabling APIs, not necessarily work without changes after a +platform upgrade. + +### Internal APIs + +These are APIs used to build the Apertis system itself but not exposed +to store applications. A library might get assigned to this support +level if it is required to implement system features, but its API is too +unstable to expose to from-store applications. Some libraries that fit +this support level might also be in the External APIs one. + +### External APIs + +Some libraries are not core enough that they warrant being shipped along +with the main system or are not very stable API-wise. One such example +is poppler, which changes API and ABI fairly often and is not really +required for most applications – it will certainly be used on the main +PDF viewing application, and most other applications will simply yield +to the system viewer when faced with a PDF file. + +That means poppler is a good candidate for bundling with the +applications that need it instead of being part of the core supported +APIs. + +### Differing stability levels + +While the Enabling, Custom, External, Internal and OS categories +separate APIs based on the level of control and direct involvement we +have over them, a separate dimension is needed to track the stability of +APIs, with four levels: private, unstable, stable, and deprecated. An +API starts as private, and can transition to any of the other levels. +Transitions between stable and deprecated are possible, but an API can +never change or go back to being unstable or private once it is stable — +this is one of the stability guarantees. + +It may be possible to move a library from the unstable level to the +stable level piecewise, for example by initially exposing a limited set +of core functions as stable, while marking the rest of the API as +'currently unstable'. Old API could later be marked as deprecated. +Further, it may be desirable to expose the same API at different levels +for different languages. For example, a library might be stable for the +C language, but unstable when used from JavaScript, pending further +testing and documentation work to mark it as stable. + +This approach allows a phased introduction of stable APIs, giving +sufficient time for them to be thoroughly reviewed and tested before +committing to their stability. + +This could be implemented in the GIR files for an API, with annotations +extracted from the gtk-doc comments of the API's C source code — gtk-doc +currently supports a 'Stability' annotation. As an XML format, GIR is +extensible, and custom attributes could be used to annotate each +function and type in an API with its stability, extracted from the +gtk-doc comments. Separate documentation manuals could then be generated +for the different stability levels, by making small modifications to the +documentation generation utilities in gtk-doc. + +Restricting less stable or deprecated parts of an API from being used by +an app written in C is technically complex, and would likely involve +compiling two versions of each library. It is suggested that less stable +functions and types are always exposed, with the understanding that app +developers use them at their own risk of having to keep up with +API-incompatible changes between Apertis versions. Their existence would +not be obvious, as they would not be included in the documentation for +the stable API. + +By contrast, restricting the use of such APIs from high-level languages +is simpler: as all language bindings use GIR, only the GIR files and the +infrastructure which handles them needs modifying to support varying the +visibility of APIs according to their stability level. The bindings +infrastructure already supports 'skipping' specific APIs, but this is +not currently hooked up to their advertised stability. A small amount of +work would be needed to enable that. + +### Maintaining API stability + +It is easy to accidentally break API or ABI stability between releases +of a library, and once a release has been made with an API break, that +break cannot be undone. + +The Debian project has some tooling to detect API and ABI changes +between releases of a library, though this is invoked at packaging time, +which is after the library has been officially released and hence after +the damage is done. + +This tooling could be leveraged to perform the ABI checks before making +a library release. + +While such tools exist for C APIs, no equivalents exist for GIR and +D-Bus APIs; the stability of these must currently be checked manually +for each release. As both APIs are described using XML formats, +developing tools for checking stability of such APIs would not be +difficult, and may be a prudent investment. + +## Components + +To illustrate how the platform APIs relate to Apertis-specific APIs, we +are reproducing here a diagram taken from the Apertis SDK documentation. +The components listed in the table below belong to the orange and green +boxes: + + + +The following table has a list of libraries that are likely to be on +Apertis images or fit into one of the supported levels discussed before. +The table has links to documentation and comments on API/ABI stability +promises made by each project for reference. As discussed before, +fitting components into one of the supported levels will be an iterative +process throughout development, so this table should not be seen as a +canonical list of supported APIs. + +| Name | Version | API reference | Notes | API/ABI Stability Guarantees | +| ----------------------- | ------- | ------------------------------------------------------------------------- | -------------------------------------------------- | -------------------------------------------------------------------------------------- | +| GLibc | 2.14 | http://www.gnu.org/software/libc/manual/html_node/index.html | Ubuntu uses EGLIBC | Aims to provide backwards compatibility | +| OpenGL ES | 2.0 | http://www.khronos.org/opengles/sdk/docs/man/ | Provided by Freescale | The standard is stable and the implementation should be as well | +| EGL | 1.4 | http://www.khronos.org/registry/egl/specs/eglspec.1.4.20110406.pdf | Provided by Freescale | The standard is stable and the implementation should be as well | +| GLib | 2.32 | http://developer.gnome.org/glib/2.31/ | | Gnome Platform API/ABI Rules | +| Cairo | 1.10 | http://cairographics.org/documentation/ | Tutorial, Example code | Stability guaranteed in stable series | +| Pango | 1.29 | http://developer.gnome.org/pango/stable/ | | Gnome Platform API/ABI Rules | +| Cogl | 1.10 | http://docs.clutter-project.org/docs/cogl/unstable/ | Latest documentation currently; | Gnome Platform API/ABI Rules | +| Clutter | 1.10 | http://docs.clutter-project.org/docs/clutter/unstable/ | Latest documentation currently; | Gnome Platform API/ABI Rules | +| Mx | 1.4 | http://docs.clutter-project.org/docs/mx/stable/ | See warning below | Stability guaranteed in stable series | +| GStreamer | 1.0 | http://gstreamer.freedesktop.org/data/doc/gstreamer/head/gstreamer/html/ | Development manual, Plugin writer's guide | Stability guaranteed in stable series | +| Clutter-GStreamer | 1.6 | http://docs.clutter-project.org/docs/clutter-gst/stable/ | | Stability guaranteed in stable series | +| GeoClue | 0.12 | http://www.freedesktop.org/wiki/Software/GeoClue | | No guarantees | +| LibXML2 | 2.7 | http://xmlsoft.org/html/index.html | Tutorial (includes some example code) | Gnome Platform API/ABI Rules | +| libsoup | 2.4 | http://developer.gnome.org/libsoup/unstable/ | | Stability guaranteed in stable series | +| librest | 0.7 | http://developer.gnome.org/librest/unstable/ | | Stability guaranteed in stable series | +| libchamplain | 0.14.x | http://developer.gnome.org/libchamplain/unstable/ | | Follows Clutter version numbering and API/ABI stability plan | +| Mutter | 3.3 | | Inlined documentation | No ABI compatibility guarantees. Still need to find about the API | +| ConnMan | 0.78 | http://git.kernel.org/?p=network/connman/connman.git;a=tree;f=doc;hb=HEAD | | No guarantees | +| Telepathy-GLib | 0.18 | http://telepathy.freedesktop.org/doc/telepathy-glib/ | | Stability guaranteed in stable series | +| Telepathy-Logger | 0.2 | http://telepathy.freedesktop.org/doc/telepathy-glib/ | | Stability guaranteed in stable series | +| Folks | 0.6 | http://telepathy.freedesktop.org/doc/folks/c/ | | Stable in the stable series for a fixed set of gobject-introspection and Vala releases | +| PulseAudio | 1.1 | http://freedesktop.org/software/pulseaudio/doxygen/ | http://pulseaudio.org/wiki/WritingVolumeControlUIs | The API/ABI hasn't been broken in years, but might break at some point for cleaning up | +| Bluez | 4.98 | http://git.kernel.org/?p=bluetooth/bluez.git;a=tree;f=doc | | Stability guaranteed in stable series | +| libstartup-notification | 0.12 | See Notes | Inlined documentation | No guarantees | +| libecal | 3.3 | http://developer.gnome.org/libecal/3.3/ | | Stability guaranteed in stable series | +| SyncEvolution | 1.2 | http://api.syncevolution.org/ | | No guarantees | +| GUPnP | 0.18 | http://gupnp.org/docs | | No guarantees | +| libGData | 0.11 | http://developer.gnome.org/gdata/unstable/ | | Stability guaranteed in stable series | +| Poppler | 0.18 | There is minimal inline API documentation | | No guarantees | +| libsocialweb | 0.26 | GLib-based API has no documentation | | No guarantees | +| Grilo | 0.1 | API docs in sources | | 0.1 is intended to be stable, 0.2 will start soon and will be unstable for a while | +| Ofono | 1.0 | http://git.kernel.org/?p=network/ofono/ofono.git;a=tree;f=doc | | No guarantees at present, but has gotten more stable recently | +| WebKit-Clutter | 1.8.0 | | | No stable releases yet | +| libexif | 0.6.20 | http://libexif.sourceforge.net/api/ | | No formal guarantees, but it's very stable | +| TagLib | 1.7 | http://developer.kde.org/~wheeler/taglib/api/index.html | | | + +## Conclusion + +Open Source has been chosen in order to be able to reuse code that is +freely available and for its customization potential. It is also desired +to keep the platform up-to-date with fresh new open source releases as +they come about. While choosing to leverage Open Source software does +lower cost and the required investment significantly, it does bring with +it some challenges when compared to building everything and controlling +the whole platform, especially when it comes to the tension between +stability and novelty. + +Those challenges will have to be met and worked upon on a case-by-case +basis, and trade-offs will have to be made. Like other distributors of +open source software have done over the years, delaying adoption of a +particular technology or newer versions of a core package goes a long +way in ensuring platform stability and providing safe and manageable +upgrade paths, so it is certainly an option that must be considered. +Other solutions should of course be considered and planned for, +including shipping more versions of the same library in parallel. +Limiting the API that is considered supported and requiring that some +libraries be statically linked or be shipped along with the program are +also tools that should be used where + necessary. + +[Android-api-removals]: http://developer.android.com/sdk/api_diff/14/changes/alldiffs_index_removals.html + +[iOS-api-changes]: https://developer.apple.com/library/ios/#releasenotes/General/iOS50APIDiff/index.html + +[Clutter 2.0]: http://wiki.clutter-project.org/wiki/ClutterChanges:2.0 diff --git a/content/designs/system-updates-and-rollback.md b/content/designs/system-updates-and-rollback.md new file mode 100644 index 0000000000000000000000000000000000000000..af62ac94ca022100bbcba1a20237b0d35453f26a --- /dev/null +++ b/content/designs/system-updates-and-rollback.md @@ -0,0 +1,1020 @@ +--- +title: System updates and rollback +short-description: Robust updates with fallback (proof-of-concept) +authors: + - name: Gustavo Noronha + - name: Frederic Dalleau + - name: Emanuele Aina +--- + +# System updates and rollback + +## Introduction + +This document focuses on the system update mechanism, but also partly addresses +applications and how they interact with it. + +## Definitions + +### Base OS + +The core components of the operating system that are used by almost all Apertis +users. Hardware control, resource management, service lifecycle monitoring, networking + +### Applications + +Components that work on top of the base OS and are specific to certain usages. + +## Use cases + +A variety of use cases for system updates and rollback are given below. + +### Embedded device on the field + +An Apertis device is shipped to a location that cannot be easily accessed by a +technician. The device should not require any intervention in the case of +errors during the update process and should automatically go back to a +know-good state if needed. + +The update process should be robust against power losses and low voltage +situations, loss of connectivity, storage exhaustion, etc. + +### Typical system update + +The user can update his system to run the latest published version of +the software. This can be triggered either via periodic polling, upon +user request, or any other suitable mean. + +### Critical security update + +In the case of a critical security issue, the OEM could push an "update +available" message to some component in the device that would in turn trigger +the update. This requires an infrastructure to reference all devices on the OEM +side. The benefit compared to periodic polling is that the delay between the +update publication and the update trigger is shortened. + +### Applications and base OS with different release cadence + +Base OS releases involve many moving parts while application releases are +simpler, so application authors want a faster release cadence decoupled from +the base OS one. + +### Shared base OS + +Multiple teams using the same hardware platform want to use the same base OS +and differentiate their product purely with applications on top of it. + +### Reselling a device + +Under specific circumstances, the user might want to reset his device +to a clean state with no device-specific or personal data. This can happen +before reselling the device or the user encountered an unexpected failure. + +## Non use cases + +### User modified device + +The user has modified his device. For example, they mounted the file +system read write, and tweaked some configuration files to customize +some features. As a result, the system update mechanism may no longer be +functional. + +It might still be possible to restore the operating system to a factory +state but the system update mechanism cannot guarantee it. + +### Unrecoverable hardware failure + +An hardware failure has damaged the flash storage or another core hardware +component and the system is no longer able to boot. Compensating for hardware +failures is not part of the system update mechanism. + +### Unrecoverable filesystem corruption + +The filesystem became corrupted due to a software bug or other failure and is +not able to automatically correct the error. How to recover from that situation +is not part of the system update and rollback mechanism. + +### Development + +Developers need to modify and customize their environment in a way that often +conflicts with the requirements for devices on the field. + +## Requirements + +### Minimal resource consumption + +Some devices only have a very limited amount of available storage, the system +update mechanism must keep the impact storage requirement as low as possible +and have a negligible impact at runtime. + +### Work on different hardware platforms + +Different devices may use different CPU architectures, bootloaders, storage +technologies, partitioning schemas and filesystem formats. + +The system update mechanism must be able to work across them with minimal changes, +ranging from single-partion systems running UBIFS on NAND devices to more common +storage devices using traditional filesystems over multiple partitions. + +### Every updated system should be indentical to the reference + +The filesystem contents of the base OS on the updated devices must match +exactly the filesystem used during testing to ensure that its behaviour can be +relied upon. + +This also means that applications must be kept separate from the base OS to be +able to update them while keeping the base OS immutable. + +### Atomic update + +To guarantee robustness in case of errors, every update to the system must be +atomic. + +This means that if an update is not successful, it must not be partially +installed. The failure must leave the device in the same state as if the update +did not start and no intermediate state must exist. + +### Rolling back to the last known good state + +If the system cannot boot correctly after an update has been installed +successfully it must automatically roll back to a known working version. + +Applications must be kept separated to be able to roll back the base OS while +preserving them or to roll them back while keeping the base OS unchanged. + +The policy deciding what to roll back and when is product-specific and must +be customizable. For instance, some products may chose to only roll back the +base OS and keep applications untouched, some other products may choose to roll +applications back as well. + +### Reset to clean state + +The user must be able to restore his device to a clean state, destroying +all user data and all device-specific system configuration. + +### Update control interface + +An interface must be provided by the updates and rollback mechanism to allow +HMI to query the current update status, and trigger updates and rollback. + +## Existing system update mechanisms + +### Debian tools + +The Debian package management binds all the software in the system. This can be +very convenient and powerful for administration and development, but this level of +management is not required for final users of Apertis. For example: + +- Package administration command line tools are not required for final users. +- No support for update roll back. If there is some package breakage, or broken + upgrade, the only way to solve the issue is manually tracking the broken + package and downgrading to a previous version, solving dependencies along the + way. This can be an error prone manual process and might not be accomplished + cleanly. + +### ChromeOS + +ChromeOS uses an A/B parallel partition approach. Instead of upgrading the system +directly, it installs a fresh image into B partition for kernel and rootfs, then +flag those to be booted next time. + +The partition metadata contains boot fields for the boot attempts (successful +boots) and these are updated for every boot. If a predetermined number of +unsuccessful boots is reached, the bootloader falls back to the other partition, +and it will continue booting from there until the next upgrade is available. +When the next upgrade becomes available it will replace the failing installation +and will attempt booting from there again. + +There are some drawbacks to this approach when compared to OSTree: + +- The OS installations are not de-duplicated, the system stores the entire + contents of the A and B installations separately, where as OSTree based systems + only store the base system plus a delta between this and any update using Unix + hardlinks. This means an update to the system only requires disk space + proportional to the changed files. +- The A/B approach can be less efficient since it will need to add extra layers + to work with different partitions, for example, using a specific layer to + verify integrity of the block devices, where OSTree directly handles operating + system views and a content addressable data store (filesystem userspace) + avoiding the need of having different layers. +- Several partitions are usually required to implement this model, reducing the + flexibility with which the storage space in the device can be utilised. + +## Approach + +Package-based solutions fail to meet the robustness requirements, while dual +partitioning schemes have storage requirements that are not viable for +smaller devices. + +[OSTree] provides atomic updates on top of any POSIX-compatible filesystem +including UBIFS on NAND devices, is not tied to a specific partitioning scheme +efficiently handles the available storage. + +No specific requirements are imposed on the partitioning schema. +Use of the GUID Partition Table ([GPT]) system for partition management is +recommended for being flexible while having fail-safe measures, like keeping +checksums of the partition layout and providing some redundancy in case +errors are detected. + +Separating the system volume from the general storage volume, where +applications and user data are stored, is also recommended. + + + +More complex schemas can be used for instance by combining OSTree +with read-only fallback partitions to handle filesystem corruption on the main +system partition, but this document focuses on a OSTree-only setup that +provides a good balance between complexity and robustness. + +### Advantages of using OSTree + +- OSTree operates at the Unix filesystem layer and thus on top of any filesystem + or block storage layout, including NAND flash setups, and in containers. +- OSTree does not impose strict requirements on the partitioning scheme and can + scale down to a single partition while fully preserving its resiliency + guarantees, saving space on the device and avoiding extra layers of + complexity (for example, to verify partition blocks). Depending on the + setup, multiple partitions can still be used effectively to separate + contents with different lifecycles, for instance by storing user data on a + different partition than the system files managed by OSTree. +- OSTree commits are centrally created offline (server side), and then they are + deployed by the client. This gives much more control over what the devices + actually run. +- It can store multiple filesystems trees in a single repository. +- It is designed to implement fully atomic and resilient upgrades. If the + system crashes or power is lost at any point during the update process, + you will have either the old system, or the new one. +- It clearly separate the OS from the device configuration and user data, so + resetting the system to a clean state simply involves deleting some + directories and their contents. +- OSTree is implemented as a shared library, making it very easy to build higher + level projects or tools on top of it. +- The files in `/usr` contents are mounted read-only from subfolders of + `/ostree/deploy`, minimizing the chance of accidental deletions or changes. +- OSTree has no impact on startup performance, nor does increase resource usage + during runtime: since OSTree is just a different way to build the rootfs once + it is built it will behave like a normal rootfs, making it very suitable for + setups with limited storage. +- OStree already offers a mechanism suitable for offline updates using static + deltas, which can be used for updates via a mass-storage device. +- Security is at the core of OSTree, offering content replication incrementally + over HTTPS via GPG signatures and using SHA256 hash checksums. +- The mechanism to apply partial updates or full updates is exactly the same, + the only difference is how the updates are generated on the server side. +- OSTree can be used for both the base OS and applications, and its built-in + hardlink-based deduplication mechanism allow to share identical contents + between the two, to keep them independent while having minimal impact on the + needed storage. The Flatpak application framework is already based on OSTree. + +### The OSTree model + +The conceptual model behind OSTree repositories is very similar to the one used +by `git`, to the point that the [introduction in the OSTree +manual](https://ostree.readthedocs.io/en/latest/manual/introduction/) refers +to it as "git for operating system binaries". + +Albeit they take different tradeoffs to address different use-cases they +both have: +* file contents stored as blobs addressable by their hash, deduplicating them +* filetrees linking filenames to the blobs +* commits adding metadata such as dates and comments on top of filetrees +* commits linked in a history tree +* branches pointing to the most recent commit in a history tree, so that + clients can find them + +Where `git` focuses on small text files, OSTree focuses on large trees of +binary files. + +On top of that OSTree adds other layers which go beyond storing and +distributing file contents to fully handle operating system upgrades: + +- repositories - store one or more versions of the filesystem contents as + described above +- deployments - specific filesystem versions checked-out from the repository +- stateroots - the combination of immutable deployments and writable directories + +Each device hosts a local OSTree repository with one or more deployments +checked out. + +Checked out deployments look like traditional root filesystems. The bootloader +points to the kernel and initramfs carried by the deployment which, after +setting up the writable directories from the stateroot, are responsible for +booting the system. The bootloader is not part of the updates and remains +unchanged for the whole lifetime of the device as any changes has a high chance +to make the system unbootable. + +- Each deployment is grouped in exactly one `stateroot`, and in normal + circumstances Apertis devices only have a single `apertis` stateroot. +- A `stateroot` is physically represented in the `/ostree/deploy/$stateroot` + directory, `/ostree/deploy/apertis` in this case. +- Each `stateroot` has exactly one copy of the traditional Unix `/var` directory, + stored physically in `/ostree/deploy/$stateroot/var`. The `/var` directory + is persisted during updates, when moving from one deployment to another + and it is up to each operating system to manage this directory. +- On each device there is an OSTree repository stored in `/ostree/repo`, + and a set of deployments stored in `/ostree/deploy/$stateroot/$checksum`. +- A deployment begins with a specific `commit` (represented by a SHA256 hash) in + the OSTree repository in `/ostree/repo`. This `commit` refers to a filesystem + tree that represents the underlying basis of a deployment. +- Each deployment is primarily composed of a set of hardlinks into the + repository. This means each version is deduplicated; an upgrade process only + costs disk space proportional to the new files, plus some constant overhead. +- The read-only base OS contents are checked out in the `/usr` directory of + the deployment. +- Each deployment has its own writable copy of the configuration store `/etc`. +- Deployments don't have a traditional UNIX `/etc` but ship it instead as + `/usr/etc`. When OSTree checks out a deployment it performs a 3-way merge + using the old default configuration, the active system's `/etc`, and the new + default configuration. +- Besides the exceptions of `/var` and `/etc` directories, the rest of the + contents of the tree are checked out as hard links into the repository. +- Both `/etc` and `/var` are persistent writable directories that get preserved + across upgrades. + +### Resilient upgrade workflow + +The following steps are performed to upgrade a system using OSTree: + +- The system boots using the existing deployment +- A new version is made available as a new OSTree commit in the local + repository, either dowloading it from the network or by unpacking a static + delta shipped on a mass storage device. +- The data is validated for integrity and appropriateness. +- The new version is deployed. +- The system reboots into the new deployment. +- If the system fails to boot properly (which should be determined by the + system boot logic), the system can roll back to the previous deployment. + +During the upgrade process, OSTree will take care of many important details, like +for example, managing the bootloader configuration and correctly merging the `/etc` +directory. + +Each `commit` can be delivered to the target system over the air or by attaching +a mass storage device. Network upgrades and mass storage upgrades only differ in the +mechanism used by `ostree` to detect and obtain the update. In both cases the `commit` +is first stored in a temporary directory, validated and only then it becomes part +of the local OSTree repository before the real upgrade process starts by rebooting +in the new deployment. + +Metadata such as EdDSA or GPG signatures can be attached to each `commit` to validate it, +ensuring it is appropriate for the current system and it has not been corrupted +or tampered. The update process must be interrupted at any point during the update +process should any check yield an invalid result; the [atomic upgrades mechanism +in OSTree][OSTree atomic upgrades] ensures that it is safe to stop the process +at any point and no change is applied to the system up to the last step in +the process. + +The atomic upgrades mechanism in OSTree ensures that any power failure during +the update process leaves the current system state unchanged and the update +process can be resumed re-using all the data that has already been already validated +and included in the local repository. + +### Online web-based OTA updates + +OSTree supports bandwidth-efficient retrieval of updates over the network. + +The basic workflow involves the actors below: + +* the image building pipelines pushes commits to an OSTree repository on + each build; +* a standard web server provides access over HTTPS to the OSTree repository + handling it as a plain hierarchy of static files, with no special knowledge + of OSTree; +* the client devices poll the web server and retrieve updates when they get + published. + +The following diagram shows how the data flows across services when using the +web based OSTree upgrade mechanism. + + + +Thanks to its repository format, OSTree client devices can efficiently query +the repository status and retrieve only the changed contents without any +OSTree-specific support in the web server, with the repository files being +served as plain static files. + +This means that any web hosting provider can be used without any loss +of efficiency. + +By only requiring static files, the web service can easily take advantage of +CDN services to offer a cost efficient solution to get the data out to the +devices in a way that is globally scalable. + +The authentication to the web service can be done via HTTP Basic +authentication, SSL/TLS client certificates, or any cookie-based mechanism that +is most suitable for the chosen web service, as OSTree does not impose any +constraint over plain HTTPS. OSTree separatedly checks the chain of trust +linking the downloaded updates to the keys trusted by the system update manager. +See also the [Controlling access to the updates repository] and +[Verified updates] sections in this regard. + +Monitoring and management of devices can be built using the same HTTPS access as +used by OSTree or using completely separated mechanisms, enabling the integration +of OSTree updates into existing setups. + +For instance, integration with rollout management suites like [Eclipse +hawkBit](https://www.eclipse.org/hawkbit/) can happen by disabling the polling +in the OSTree updater and letting the management suite tell OSTree which commit +to download and from where through a dedicated agent running on the devices. + + + +This has the advantage that the rollout management suite would be in complete +control of which updates should be applied to which devices, implementing +any kind of policies like progressive staged rollouts with monitoring from the +backend with minimal integration. + +Only the retrieval and application of the actual update data on the device would +be offloaded to the OSTree-based update system, preserving its network and +storage efficiency and the atomicity guarantees. + +### Offline updates + +Some devices may not have any connectivity, or bandwidth requirements may make +full system updates prohibitive. In these cases updates can be made available +offline by providing OSTree "static delta" files on external media devices like +USB mass storage devices. + +The deltas are simple static files that contains all the differences between two +specific OSTree commits. +The user would download the delta file from a web site and put it in the root +of an external drive. After the drive is mounted, the update management +system would look for files with a specific name pattern in the root of +the drive. If an appropriate file is found, it is checked to be a valid +OSTree static bundle with the right metadata and, if that verification +passes, the user would get a notification saying that updates are +available from the drive. If the update file is corrupted, is targeted +to other platforms or devices, or is otherwise invalid, the upgrade +process must stop, leaving the system unchanged and a notification may +be reported to the user about the identified issues. + +Static deltas can be partial if the devices are known beforehand to have a +specific OSTree commit already available locally, or they can be full by +providing the delta from the `NULL` empty commit, thus ensuring that the update +can be applied without any assumption on the version running on the devices at +the expense of a potential increase of the requirements on the mass storage +device used to ship them. Both partial and full deltas leading to the same +OSTree commit will produce identical results on the devices. + + +### OSTree security + +OSTree is a distribution method. It can secure the downloading of the update by +verifying that it is properly signed using public key cryptography (EdDSA or GPG). +It is largely orthogonal to verified boot, that is ensuring that only signed data +is executed by the system from the bootloader, to the kernel and +userspace. +The only interaction is that since OSTree is a file-based distribution mechanism, +block-based verification mechanism like `dm-verity` cannot be used. +OSTree can be used in conjunction with signed bootloader, signed kernel, and +IMA (Integrity Measurement Architecture) to provide protection from offline +attacks. + +#### Verified boot + +Verified boot is the process which ensures a device is only runs signed code. +This is a layered process where each layer verifies signature of its upper +layer. + +The bottom layer is the hardware, which contains a data area reserved to +certificates. +The first step is thus to provide a signed bootloader. The processor can verify +the signature of the bootloader before starting it. +The bootloader then reads the boot configuration file. It can then run a signed +kernel and initramfs. +Once the kernel has started, the initramfs mounts the root filesystem. + +At the time of writing, the boot configuration file is not signed. It is +read and verified by signed code, and can only point to signed components. + +Protecting bootloader, kernel and initramfs already guarantees that policies +baked in those components cannot be subverted through offline attacks. By +verifying the content of the rootfs the protection can be extended to userspace +components, albeit such protection can only be partial since device-local data +can't be signed on the server-side like the rest of the rootfs. + +To protect the rootfs different mechanisms are available: the block-based ones +like `dm-verity` are fundamentally incompatible with file-based distribution +methods like OSTree, since they rely on the kernel to verify the signature on +each read at the block level, guaranteeing that the whole block device has not +been changed compared to the version signed at deployment time. Due to working +on raw block devices, `dm-verity` is also incompatible with UBIfs and thus it +is unsuitable for NAND devices. + +Other mechanisms like IMA (Integrity Measurement Architecture) work instead at +the file level, and thus can be used in conjunction with UBIfs and OSTree on +NAND devices. + +It is also possible to check that the deployed OSTree rootfs matches the +server-provided signature without using any other mechanism, but unlike IMA and +`dm-verity` such check would be too expensive to be done during file access. + +#### Verified updates + +Once a verified system is running, an OStree update can be triggered. +Apertis is using [ed25519](https://ed25519.cr.yp.to/) variant of EdDSA signature. +Ed25519 ensures that the commit was not modified, damaged, or corrupted. + +On the server, OSTree commits must be signed using ed25519 secret key. +This occurs via the `ostree sign --sign-type=ed25519 <COMMIT_ID>` command line. +The secret key could be provided via additional CLI parameter or file by +using option `--keys-file=<path_to_file>`. + +Ostree expect what secret key consists of 64 bytes (32b seed + 32b public) encoded +with base64 format. The ed25519 secret and public parts could be generated by +numerous utilities including `openssl`, for instance: + +--- +openssl genpkey -algorithm ed25519 -outform PEM -out ed25519.pem +--- + +Since ostree is not capable to use PEM format directly, it is needed to [extract the +secret and public keys](http://openssl.6102.n7.nabble.com/ed25519-key-generation-td73907.html) +from pem file, for example: +--- +PUBLIC="$(openssl pkey -outform DER -pubout -in ${PEMFILE} | tail -c 32 | base64)" +SEED="$(openssl pkey -outform DER -in ${PEMFILE} | tail -c 32 | base64)" +--- +As mentioned above, the secret key is concatenation of SEED and PUBLIC parts: +--- +SECRET="$(echo ${SEED}${PUBLIC} | base64 -d | base64 -w 0)" +--- + +On the client, ed25519 is also used to ensure that the commit comes from a trusted +provider since updates could be acquired through different methods like OTA over +a network connection, offline updates on plug-in mass storage devices, +or even mesh-based distribution mechanism. +To enable the signature check, repository on the client must be configured by +adding option `sign-verify=true` into the `core` or per-remote section, for instance: +--- +ostree config set 'remote "origin".sign-verify' "true" +--- + +OSTree searches for files with valid public signatures in directories +`/usr/share/ostree/trusted.ed25519.d` and `/etc/ostree/trusted.ed25519.d`. +Any public key in a file in these directories will be trusted by the client. +Each file may contain multiple keys, one base64-encoded public key per string. +No private keys should be present in these directories. + +In addition it is possible to provide the trusted public key per-remote by adding +into remote's configuration path to the file with trusted public keys (via +`verification-file` option) or even single key itself (via `verification-key`). + +In the OSTree configuration, the default is to require commits to be signed. +However, if no public key is available, no any commit can be trusted. + +#### Securing OSTree updates download + +OStree supports "pinned TLS". Pinning consist of storing the public key of the +trusted host on the client side, thus eliminating the need for a trust +authority. + +TLS can be configured in the `remote` configuration on the client using +the following entries: + +--- +tls-ca-path + Path to file containing trusted anchors instead of the system CA database. +--- + +Once a key is pinned, OSTree is ensured that any download is coming from a +host which key is present in the image. + +The pinned key can be provided in the disk image, ensuring every flashed device +is able to authenticate updates. + +#### Controlling access to the updates repository + +TLS also permit the OSTree client to authenticate itself to the server before +being allowed to download a commit. +This can also be configured in the `remote` configuration on the client using +the following entries: +--- +tls-client-cert-path + Path to file for client-side certificate, to present when making requests to + this repository. +tls-client-key-path + Path to file containing client-side certificate key, to present when making + requests to this repository. +--- + +Access to remote repositories can also be controlled via HTTP cookies. The +`ostree remote add-cookie` and `ostree remote delete-cookie` commands will +update a per-remote lookaside cookie jar, named `$remotename.cookies.txt`. +In this model, the client first obtains an authentication cookie before +communicating this cookie to the server along with its update request. + +The choice between authentication via TLS client-side certificates or HTTP +cookies can be done depending on the chosen server-side infrastructure. + +Provisioning authentication keys on a per-device basis at the end of the +delivery chain is recommended so each device can be identified and access +granted or denied at the device granularity. Alternatively it is possible to +deploy authentication keys at coarser granularities, for instance one for each +device class, depending on the specific use-case. + +#### Security concerns for offline updates over external media + +OSTree static deltas includes the detached metadata with signature for the +contained commit to check if the commit is provided by a valid provider and +its integrity. + +The signed commit is unpacked to a temporary directory and verified by OSTree +before being integrated in the OSTree repository on the device, +from which it can be deployed at the next reboot. + +This is the same mechanism used for commit verification when doing OTA upgrades +from remote servers and provides the same features and guarantees. + +Usage of inlined signed metadata ensures that the provided update +file is aimed to the target platform or device. + +Updates from external media present a security problem not present for +directly downloaded updates. Simply verifying the signature of a file +before decompressing is an incomplete solution since a user with +sufficient skill and resources could create a malicious USB mass storage +device that presents different data during the first and second read of +a single file – passing the signature test, then presenting a different +image for decompression. + +The content of the update file is extracted into the temporary directory +and the signature is checked for the extracted commit tree. + +### Error handling + +If for any reason the update process fails to complete, the update will +be blacklisted to avoid re-attempting it. Another update won't be +automatically attempted until a newer update is made available. + +The only exception from this rule is failure due incorrect signature check. +The commit could be re-signed with the key not known for the client at +this moment, and as soon as client acquire the new public key blacklist +mechanism shouldn't prevent the update. + +It is possible that an update is successfully installed yet fail to +boot, resulting in a rollback. In the event of a rollback the update +manager must detect that the new update has not been correctly +booted, and blacklist the update so it is not attempted again. To detect a +failed boot a watchdog mechanism can be used. +The failed updates can then be blacklisted by appending their OSTree commit ids +to a list. + +This policy prevents a device from getting caught in a loop of +rollbacks and failed updates at the expense of running an old version of +the system until a newer update is pushed. + +The most convenient storage location for the blacklist is the user +storage area, since it can be written at run-time. As a side effect of +storing the blacklist there, it will automatically be cleared if the +system is reset to a clean state. + + + +## Implementation + +This section provides some more details about the implementation of offline +system updates and rollback in Apertis, which is split in three main +components: + +* the updater daemon +* the bootloader integration +* the command-line HMI + +### The general flow + +The Apertis update process deals with selecting the OSTree deployment to boot, +rolling back to known-good deployments in case of failure and preparing the new +deployment on updates: + + + +While the underlying approach differs due to the use of OSTree in Apertis over +the dual-partition approach chosen by ChromeOS and the different bootloaders, +the update/rollback process is largely the same as [the one in +ChromeOS](https://www.chromium.org/chromium-os/chromiumos-design-docs/filesystem-autoupdate#TOC-Diagram). + +### The boot count + +To keep track of failed updates the system maintains a persistent counter that +it is increased every time a boot is attempted. + +Once a boot is considered successful depending on project-specific policies (for +instance, when a specific set of services has been started with no errors) the +boot count is reset to zero. + +This boot counter needs to be handled in two places: + +* in the bootloader, which boots the current OSTree deployment if the counter + is zero and initiates a rollback otherwise +* in the updater, which needs to reset it to zero once the boot is + considered successful + +Using the main persistent storage to store the boot count is viable for most +platform but would produce too much wear on platforms using NAND devices. +In those cases the boot count should be stored on another platform-specific +location which is persistent over warm reboots, there's no need for it to +persist over power losses. + +However, in the reference implementation the focus is on the most general +solution first, while being flexible enough to accomodate other solutions +whenever needed. + +### The bootloader integration + +Since bootloaders are largely platform-specific the integration needs to be +done per-platform. + +For the SabreLite ARM 32bit platform, integration with the [U-Boot] bootloader +is needed. + +OSTree already provides dedicated hooks to update the `u-boot` environment to +point it to the latest deployment. + +Two separate boot commands are used to start the system: the default one boots +the latest deployment, while the alternate one boots the previous deployment. + +Before rebooting to a new deployment the boot configuration file is switched +and the one for the new deployment is made the default, while the older one is +put into the location pointed by the alternate boot command. + +When a failure is detected by checking the boot count while booting the latest +deployment, the system reboots using the alternate boot command into the +previous deployment where the rollback is completed. + +Once the boot procedure completes successfully the boot count gets reset and +stopped, so failures in subsequent boots won't cause rollbacks which may worsen +the failure. + +If the system detects that a rollback has been requested, it also need to make +the rollback persistent and prevent the faulty updates to be tried again. To do +so, it adds any deployment more recent than the current one to a local +blacklist and then drops them. + +### The updater daemon + +The upgrader daemon is responsible for most of the activities involved, such as +detecting available updates, initiating the update process and managing the +boot count. + +It handles both online OTA updates and offline updates made available on +locally mounted devices. + +#### Detecting new available updates + +For offline updates, the +[GVolumeMonitor](https://developer.gnome.org/gio/stable/GVolumeMonitor.html) +API provided by GLib/GIO is used to detect when a mass storage device is +plugged into the device, and the +[GFile](https://developer.gnome.org/gio/stable/GFile.html) GLib/GIO API is +used to scan for the offline update stored as a plain file in the root +of the plugged filesystem named `static-update.bundle`. + +For online OTA updates, the +[`OstreeSysrootUpgrader`](https://github.com/ostreedev/ostree/blob/master/src/libostree/ostree-sysroot-upgrader.c) +is used to poll the remote repository for new commits in the configured branch. + +When combined with rollout management systems like [Eclipse +hawkBit](https://www.eclipse.org/hawkbit/), the rollout management agent on the +device will initiate the upgrade process without the need for polling. + +#### Initiating the update process + +Once the update is detected, it is verified and compared against a local +blacklist to skip updates that have already failed in the past (see [Update validation]). + +In the offline case the static delta file is checked for consistency before +being unpacked in the local OSTree repository. + +During online updates, files are verified as they get downloaded. + +In both cases the new update results in a commit in the local OSTree repository +and from that point the process is identical: a new deployment is created from +the new commit and the bootloader configuration is updated to point to the new +deployment on the next boot. + +#### Reporting the status to interested clients + +The updater daemon exports a simple D-Bus interface which allows to check the +state of the update process and to mark the current boot as successful. + +#### Resetting the boot count + +During the boot process the boot count is reset to zero using an interface that +abstracts over the platform-specific approach. + +While product-specific policies dictate when the boot should be considered +successful, the reference images consider a boot to be successful if the +`multi-user.target` target is reached. + +#### Marking deployments + +Rolled back deployments are added to a blacklist to avoid trying them again +over and over. + +Deployments that have booted succesfully get marked as known good so that they +are never rolled back, even if at a later point a failure in the boot process +is detected. This is to avoid transient issues causing an unwanted rollback +which may make the situation worse. + +To do so, the boot counting is stopped once the current boot is considered +succesful, effectively marking the current boot as known-good without the need +to maintain a whitelist and synchronize it with the bootloader. + +### Command line HMI + +A command line tool is provided to query the status using +[the `org.apertis.ApertisUpdateManager` D-Bus API](https://gitlab.apertis.org/appfw/apertis-update-manager/blob/master/data/apertis-update-manager-dbus.xml): + +--- +$ updatectl +** Message: Network connected: No +** Message: Upgrade status: Deploying +--- + +The current API exposes information about whether the updater is idle, an +update is being checked, retrieved or deployed, or whether a reboot is pending +to switch to the updated system. + +It can also be used to mark the boot as successful: + +--- +$ updatectl --mark-update-successful +--- + +### Update validation + +Before installing updates the updater check their validity and appropriateness +for the current system, using the metadata carried by the update itself as +produced by the build pipeline. +It ensures that the update is appropriate for the system by verifying that the +collection id in the update matches the one configured for the system. This +prevents installing an update meant for a different kind of device, or mixing +variants. +The updater also checks that the update version is newer than the one on the +system, to prevent downgrade attacks where a older update with known +vulnerabilities is used to gain privileged access to a target. + +### Testing + +Testing ensures that the following system properties for each image +are maintained: + +* the image can be updated if a newer update bundle is plugged in +* the update process is robust in case of errors +* the image initiates a rollback to a previous deployment if an error is + detected on boot +* the image can complete a rollback initiated from a later deployment + +To do so, a few components are needed: + +* system update bundles have to be built as part of the daily build pipeline +* a know-good test update bundle with a very large version number must be + create to test that the system can update to it + +At least initially, testing is done manually. Automation from LAVA will be +researched later. + +#### Images can be updated + +Plugging a device with the known-good test update on it bundle the expectation +is that the system detects it, initiates the update and on reboot the +deployment from the known-good test bundle is used. + +### The update process is robust in case of errors + +To test that errors during the update process don't affect the system, the +device is unplugged while the update is in progress. Re-plugging it after that +checks that updates are gracefully restarted after transient errors. + +### Images roll back in case of error + +Injecting an error in the boot process checks that the image initiates the roll +back to a previous deployment. Since a newly-flashed image doesn't have any +previous deployment available, one needs to be manually set up beforehand by +downloading an older OSTree commit. + +### Images are a suitable rollback target + +A known-bad deployment similar to the known-good one can be used to ensure that +the current image works as intended when it is the destination of a rollback +initiated by another deployment. + +After updating to the known-bad deployment the system should rollback to the +version under test, which should then complete the rollback by cleaning the +boot count, blacklisting the failed update and undeploy it. + +## User and user data management + +As described in the +[Multiuser](https://designs.apertis.org/latest/multiuser.html) design document, +Apertis is meant to accomodate multiple users on the same device, +using existing concepts and features of the several open source components that +are being adopted. + +All user data should be kept in the general storage volume on setups where it +is available, as it enables simpler separation of concerns, and a simpler +implementation of user data removal. + +Rolling back user and application data cannot be generally applied and no +existing general purpose system supports it. +Applications must be prepared to handle configuration and data files coming +from later versions and handle that gracefully, either ignoring unknown +parameter or falling back to the default configuration if the provided one is +not usable. + +Specific products can choose to hook to the update system and manage their own +data rollback policies. + +## Application management + +Application management on Apertis has requirements that the main update +management system does not: + +- It is unreasonable to expect a system restart after an application + update. + +- Each application must be tracked independently for rollbacks. System + updates only track one “stream†of rollbacks, where the application + update framework must track many. + +Flatpak matches the requirements and is also based on OSTree. The ability to +deduplicate contents between the system repository and the applications +decouples applications from the base OS yet keeping the impact on storage +consumption minimal. + +### Application storage + +Applications can be stored per-device or per-user depending on the needs of +the product. + +An application may require storage space for personal settings, license +information, caches, and any manner of long term private storage. These +files should generally not be easily accessible to the user as directly +modifying them could have detrimental effects on the application. + +Application storage requirements can be divided into broad groups: + +- An area for application exports to integrate with the system. This + is managed by the application manager and not directly by + applications themselves. + +- User specific application data – for settings and any other per-user + files. In the event of an application rollback, depending on the product + this data may get rolled back with the application or the application needs + to deal with potentially mismatching versions. + +- Application specific application data – for data that is rolled back + with an application but isn't tied to a user account – such as voice + samples or map data. This data should be handled in the same way as user + specific application data. + +- Cache – easily recreated data. To save space, this should not be stored + for rollback purposes, and should be cleared on a rollback in case + applications change their cache data formats between versions. + +- Storage for files in standard formats that aren't tied to specific + applications, as explained in the + [Multiuser](https://designs.apertis.org/latest/multiuser.html) design, this + storage is shared between all users. This data should be exempt from the + rollback system. + +## Further developments + +* Handling a larger threat model using [The Update Framework + Specification](https://github.com/theupdateframework/specification/blob/master/tuf-spec.md) + / [Uptane](https://uptane.github.io/) with [Aktualizr](https://foundries.io/insights/2018/05/25/ota-part-1/) + +* Integrating with server side management services like [Eclipse hawkBit](https://www.eclipse.org/hawkbit/) + +* Hardware-assisted [verified boot](https://www.chromium.org/chromium-os/chromiumos-design-docs/verified-boot) + with TPM/OpTEE + +* Filesystem-level integrity checks [Integrity Measurement Architecture + (IMA)/Extended Verification Module (EVM)](https://sourceforge.net/p/linux-ima/wiki/Home/) + +* Add failsafe partition to handle filesystem corruption + +## Related Documents + +A survey of system update managers: + + - [*https://wiki.yoctoproject.org/wiki/System_Update*](https://wiki.yoctoproject.org/wiki/System_Update) + +The OSTree bootable filesystems tree store: + + - [*http://ostree.readthedocs.io*](http://ostree.readthedocs.io) + +The U-Boot boot loader: + + - [*http://www.denx.de/wiki/U-Boot/WebHome*](http://www.denx.de/wiki/U-Boot/WebHome) + +The ChromeOS autoupdate system: + - [*https://www.chromium.org/chromium-os/chromiumos-design-docs/filesystem-autoupdate*](https://www.chromium.org/chromium-os/chromiumos-design-docs/filesystem-autoupdate) + +[OSTree]: http://ostree.readthedocs.io + +[OSTree atomic upgrades]: https://ostree.readthedocs.io/en/latest/manual/atomic-upgrades/#atomic-upgrades + +[GPT]: http://en.wikipedia.org/wiki/GUID_Partition_Table + +[U-Boot]: http://www.denx.de/wiki/U-Boot/WebHome diff --git a/content/designs/test-data-reporting.md b/content/designs/test-data-reporting.md new file mode 100644 index 0000000000000000000000000000000000000000..f9a8760acf899f2c611505fad103d29ed02d52ad --- /dev/null +++ b/content/designs/test-data-reporting.md @@ -0,0 +1,402 @@ +--- +title: Test Data Reporting +short-description: Describe test data reporting and visualization. +authors: + - name: Luis Araujo +--- + +# Background + +Testing is a fundamental part of the project, but it is not so useful unless it +goes along with an accurate and convenient model to report the results of such a +testing. + +Receiving notifications on time about critical issues, easily checking tests +results, or analyzing tests trends among different images versions are some of +the examples of test reporting that helps to keep a project in a good state +through its different phases of development. + +The goal of this document is to define a model for on time and accurate reporting +of tests results and issues in the project. This model should be adapted to the +project needs and requirements, along with support for convenient visualization of +the test data and reports. + +The solution proposed in this document should fit the mechanisms available to +process the test data in the project. + +# Current Issues + + - Tests reports are created manually and stored in the wiki. + - There is no convenient way to analyze test data and check tests logs. + - There does not exist a proper notification system for critical issues. + - There is no mechanism to generate statistics from the test data. + - There is no way to visualize test data trends. + +# Solution + +A system or mechanism must be implemented with a well defined workflow fulfilling +the project requirements to cover the test reporting and visualization of all the +test data. + +The solution will mainly involve designing and implementing a web application +dashboard for visualization of test results and test cases, and a notification +mechanism for tests issues. + +# Test Cases + +Test cases will be available from the Git repository in YAML format as explained +in the document about [test data storage and processing][TestDataStorage]. + +The Gitlab Web UI will be used to read test cases rendered in HTML format. + +A link to the test case page in Gitlab will be added to the test results metadata +to easily find the exact test case instructions that were used to execute the +tests. This link will be shown from the web application dashboard and the SQUAD +UI for convenient access of the test case for each test result. + +As originally proposed in the [test data storage][TestDataStorage] document, the +test case file will be the canonical specification for the test instructions, and +it will be executed both by the automated tests and during manual test execution. + +# Test Reports + +Test results will be available from the SQUAD backend in JSON format as explained +in the document about [test data storage and processing][TestDataStorage]. + +The proposal for reporting test results involves two solutions, a web application +dashboard and a notification mechanism. Both of them will use the SQUAD API to +access the test data. + +# Web Application Dashboard + +A web application dashboard must be developed to view test results and generate +reports from it. This dashboard will serve as the central place for test data +visualization and report generation for the whole project. + +The web application dashboard will be running as a HTTP web service, and it can +be accessed using a web browser. Details about the specific framework and platform +will be defined during implementation. + +This application should allow to do the following at the minimum: + + - Filter and view test results by priority, test categories, image types, + architecture and test type (manual or automated). + - Link test results to the specific test cases. + - Graphics to analyze test data trends. + +The web application won't process test results in any way, nor manipulate the test +data or change the test data in the storage backend. Its only purpose is to +generate reports and visual statistics for the test data, so it only has a one way +commnication channel with the data storage backend in order to fetch the test data. + +The application may also be progressively extended to export data in different +formats such as spreadsheets and PDFs + +This dashboard will serve as a complement to the SQUAD Web UI that is more suitable +for developers. + +## Components + +The web application will consist at least of the following functionalities modules: + + - Results Fetcher + - Filter and Search Engine + - Results Renderer + - Test Report Generator + - Graphics Generator + - Application API (Optional) + - Format Exporters (Optional) + +Each of these components or modules can be independent tools or be part of a single +framework for web development. Proper researching about the more suitable model +and framework should be done during implementation. + +Apart of these components, new ones might be added during implementation to support +the above components and any other functionality required by the web application +dashboard (for example, for HTML and data rendering, allow privileged operations +if needed, and so on). + +This section will give an overview of each of the above listed components. + +### Results Fetcher + +This component will take care of fetching the test data from the storage backend. + +As explained in the [test data storage document][TestDataStorage], the data storage +backend is SQUAD, so this component can use the SQUAD API to fetch the required +test results. + +### Filter and Search Engine + +This involves all the filtering and searching capabilities for test data and it can +be implemented either using existing web application modules or extending those +to suit the dashboard needs. + +This engine will only search and filter test results data and won't manipulate that +data in any other way. + +### Results Renderer + +This component will take care of showing the test results visualization. It is +basically the HTML renderer for the test data, with all the required elements +for the web pages design. + +### Test Report Generator + +This includes all the functions to generate all kind of test reports. It can also +be split into several modules (for example, one for each type of report), and +it should ideally offer a command line API that can be used to trigger and fetch +test reports remotely. + +### Graphics Generator + +It comprises all the web application modules to generate graphics, charts and any +other visual statistics, including the history view. In the same way as other +components, it can be composed of several smaller components. + +### Application API + +Optionally the web application can also make available an API that can be used to +trigger certain actions remotely, for example, generation and fetching of test +reports, or test data exporting are some of the possible features for this API. + +### Format Exporters + +This should initially be considered an optional module which will include support +to export the test data into different formats, for example, PDF and spreadsheets. + +It can also offers a convenient API to trigger this kind of format generations +using command line tools remotely. + +## History View + +The web application should offer a compact historical overview of all the tests +results through specific period of times to distinguish at a glance important +trends of the results. + +This history view will also be able to show results from randomly chosen dates, +so in this way it is possible to generate views for comparing test data between +different images cycles (daily, weekly or releases images). + +This view should be a graphical visualization that can be generated periodically or +at any time as needed from the web application dashboard. + +In a single compact view, at least the following information should be available: + + - All tests names executed by images. + - List of images versions. + - Platforms and images type. + - Number of failed, passed and total tests. + - Graphic showing the trend of tests results across the different images + versions. + +### Graphical Mockup + +The following is an example of how the history view might look like for test +results: + + + +## Weekly Test Report + +This report should be generated using the web application dashboard described in +the previous section. + +The dashboard should allow to generate this report weekly or at any time as needed, +and it should offer both a Web UI and a command line interface to generate the +report. + +The report should contain at least the following data: + + - List of images used to run the tests. + - List of tests executed ordered by priority, image type, architecture and + category. + - Tests results will be in the form: PASS, FAIL, SKIP. + - Image version. + - Date of test execution. + +The report could also include the historical view as explained in the section +[](#history-view) and allow exporting to all formats supported by the web +application dashboard. + +## Application Layout and Behaviour + +The web application dashboard will only show test results and generate test +reports. + +The web application will fetch the test data from SQUAD directly to generate all +the relevant web pages, graphics and test reports once it is launched. So, the web +application won't store any test data and all visual information will be generated +on runtime. + +For the main layout, the application will show in the initial main page the history +view for the last 5~10 images versions as this will help to have a quick overview +of the current status of tests for latest images at a first glance. + +Along with the history view in the main page, a list of links to the latests test +reports will also be shown. These links can point to previously saved searches or +they can just be convenient links to generate test reports for past images +versions. + +The page should also show the relevant options for filtering and searching test +results as explained in the [web application dashboard section](#web-application-dashboard). + +In summary, the minimal required layout of the main page for the web application +dashboard will be the history view, a list to recent test reports and the searching +and filtering options. + +# Notifications + +At least for critical and high priority tests failures, a notification system must +be setup. + +This system could send emails to a mailing list and messages to the Mattermost +chat system for greater visibility on time. + +This system will work as proposed in the [closing ci loop document][ClosingCiLoop]. +It will be a Jenkins phase that will receive the automated tests results previously +analyzed, and will determine the critical tests failures in order to send the +notifications. + +For manual tests results, the Jenkins phase could be manually or periodically +triggered once all the tests results are stored in the SQUAD backend. + +### Format + +The notification message should at least contain the following information: + + - Test name. + - Test result (FAIL). + - Test priority. + - Image type and architecture. + - Image version. + - Link to the logs (if any). + - Link to attachments (if any). + - Date and time of test execution. + +# Infrastructure + +The infrastructure for the web application dashboard and notification system will +be defined during implementation, but they all will be aligned to the requirements +proposed by the document for [closing the CI loop][ClosingCiLoop], so it won't +impose any special or resource intensive requirements beyond the current CI loop +proposal. + +# Test Results Submission + +For automated tests, the test case will be executed by LAVA and results will be +submitted to the SQUAD backend as explained in the [closing ci loop document][ClosingCiLoop]. + +For manual tests, a new tool is required to collect the tests results and submit +those to SQUAD. This can be either a command line tool or a web application that +could render the test case pages for convenient visualization during the test +execution, or link to the test cases Gitlab pages for easy reference. + +The main function of this application will be to collect the manual tests results, +optionally guide the tester through the test cases steps, generate a JSON file +with the test results data, and finally send these results to the SQUAD backend. + +# SQUAD + +SQUAD offers a web UI frontend that allows to check tests results and metadata, +including their attachments and logs. + +This web frontend is very basic, it only shows the tests organized by teams and +groups, and list the tests results for each test stored in the backend. Though it +is a basic frontend, it can be useful for quickly checking results and making sure +the data is properly stored in SQUAD, but it might be intended to be used only by +developers and sometimes testers as it is not a complete solution from a project +management perspective. + +For a more complete visualization of the test data, the new web application +dashboard should be used. + +# Concept Limitations + +The platform, framework and infrastructure for the web application is not covered +by this document and it needs to be defined during implementation. + +# Current Implementation + +The [QA Test Report][QAReportApplication] is an application to save and report all +the test results for the Apertis images. + +It supports both types of tests, automated tests results executed by LAVA and +manual tests results submitted by a tester. It only provides static reports with no +analytical tools yet. + +## Workflow + +The deployment consists of two docker images, one containing the main report +application and the other running the postgresql database. The general workflow is +as follows: + +### Automated Tests + +1) The QA Report Application is executed and it opens HTTP interfaces to receive + HTTP requests calls and serve HTML pages in specific HTTP routes. + +2) Jenkins builds the images and they are pushed to the image server. + +3) Jenkins triggers the LAVA jobs to execute the automated tests in the published + images. + +4) Jenkins, when triggering the LAVA jobs, also registers these jobs with the QA + Report Application using its specific HTTP interface. + +5) The QA Report application adds these jobs in its internal queue and waits + for the LAVA tests jobs results to be submitted via HTTP. + +6) Once LAVA finishes executing the tests jobs, it triggers the configured HTTP + callback sending all the test data to the QA Report application. + +7) Test data for the respective job is saved into the database. + +### Manual Tests + +1) User authenticate with GitLab credentials from the `Login` button in the main + page. + +2) Once logged in, the user can click on the `Submit Manual Test Report` button + that is now available from the main page. + +3) Tester needs to enter the following information in the `Select Image Report` + page: + + - Release: Image release (19.03, v2020dev0 ..) + - Version: The daily build identifier (20190705.0, 20190510.1 ..) + - Select Deployment Type (APT, OSTree) + - Select Image Type + +4) A new page only showing the valid test cases for the selected image type + is shown. + +5) User selects `PASS` , `FAIL` or `NOT TESTED` for each test case. + +6) An optional `Notes` text area box is avaibale besides each test case for the + user to add any extra information (e.g tasks links, a brief comment about any + issue with the test, etc). + +7) Once results have ben selected for all test cases, user should submit this + data using the `Submit All Results` button at the top of the page. + +8) The application now will save the results into the database and redirect the + user to a page with the following two options: + + - Submit Manual Test Report: To submit tests results for a new image type. + - Go Back to Main Page: To check the recently submitted tests results. + +9) If the user wants to update a report, just repeat the above steps selecting + the specific image type for the existing report and then updating the results + for the necessary test cases. + +### Reports + +1) Reports for the stored test results (both manual and automated) are generated + on the fly by the QA application such as: https://lavaphabbridge.apertis.org/report/v2019dev0/20190401.0 + +[TestDataStorage]: test-data-storage.md + +[ClosingCiLoop]: closing-ci-loop.md + +[QAReportApplication]: https://gitlab.apertis.org/infrastructure/lava-phab-bridge/ diff --git a/content/designs/test-data-storage.md b/content/designs/test-data-storage.md new file mode 100644 index 0000000000000000000000000000000000000000..32342c9ee45c3cfdfb7bd3aa69a718e9ea1ff727 --- /dev/null +++ b/content/designs/test-data-storage.md @@ -0,0 +1,936 @@ +--- +title: Test Data Storage +short-description: Describe the test data storage backend and processing. +authors: + - name: Luis Araujo +--- + +# Background + +Testing is a core part of the project, and different test data is required to +optimise the testing process. + +Currently the project does not have a functional and well defined place for +storage of the different types of test data, which creates many issues across +the testing processes. + +The goal of this document is to define a single storage place for all the test +data and build on top it the foundation for accurate test data processing and +reporting. + +# Current Issues + +## Test Case Issues + +At this time, test cases are stored in the Apertis MediaWiki instance with a +single page for each test case. Although this offers a reasonable degree of +visibility for the tests, the storage method is not designed to manage this +type of data, which means that there are only some limited features available +for handling the test cases. + +The wiki does not provide a convenient way to reuse this data through other +tools or infrastructure services. For example, management functions like +filtering or detailed searching are not available. + +Test cases may also come out of sync with the automated tests, since they are +managed manually in different places: an automated test might not have a test +case page available, or the test case could be marked as obsolete while it is +still being executed automatically by LAVA. + +Another big issue is that test cases are not versioned, so there is no way to +keep track of which specific version of a test case was executed for a specific +image version. + +## Test Result Issues + +Automated tests results are stored in the LAVA database after the tests are +executed, while manual tests results are manually added by the tester to the +wiki page report for the weekly testing round. This means that the wiki is the +only storage location for all test data. As with test cases, there are also +limits to the functionally of the wiki when handling the results. + +LAVA does not offer a complete interface or dashboard to clearly track test +results, and the current interface to fetch these results is not user friendly. + +Manual results are only available from the Apertis wiki in the Weekly Testing +Report page, and they are not stored elsewhere. + +The only way to review trends between different test runs is to manually go +through the different wiki and LAVA web pages of each report, which is +cumbersome and time consuming. + +Essentially, there is no a canonical place for storing all the test results for +the project. This has major repercussions since there is no way to keep proper +track of the whole project health. + +## Testing Process Issues + +The biggest issue is the lack of a centralised data storage for tests results +and tests cases, creating the following issues for the testing process: + + - It is not possible to easily analyse tests results. For example, there is no + interface for tracking test result trends over a period of time or across + different releases. + + - Tests cases are not versioned, so it is not possible to know exactly which + test cases are executed for a specific image version. + + - Test cases instructions can differ from the actual test instructions being + executed. This issue tends to happen mainly with automated tests: for + example, when a test script is updated but the corresponding test case + misses the update. + + - Tests results cannot be linked to test cases because test data is located in + different places and test cases have no version information. + +# Solution + +A data storage backend need to be defined to store all test cases and test +results. + +The storage backend may not be necessarily the same for all the data types, +but a well defined mechanism should be available to access this data in a +consistent way from our current infrastructure, and one solution should not +impose limitations or constraints onto the other. For example, one backend can +be used only for test cases and another for test results. + +## Data Backend Requirements + +The data storage backend should fulfil the following conditions at the minimum: + + - Store all test cases. + - Store all manual and automated test results. + - It should make no distinction between manual and automated test cases, and + ideally offer a very transparent and consistent interface for both types of + tests. + - It should offer an API to access the data that can be easily integrated with + the rest of the services in the existing infrastructure. + - It should allow the execution of management operations on the data + (querying, filtering, searching). + - Ideally, it should offer a frontend to simplify management operations. + +# Data + +We are interested in storing two types of test data: test cases and test +results. + +## Test Cases + +A test case is a specification containing the requirements, environment, inputs, +execution steps, expected results and metadata for a specific test. + +The test cases descriptions in the wiki include custom fields that will need to +be defined during the design and development of the data storage solution. The +design will also need to consider the management, maintenance and tools required +to handle all test case data. + +## Test Results + +Tests results can be of two types: manual and automated. + +Since tests results are currently acquired in two different places depending on +the test type, this makes it very inconvenient to process and analyse test data. + +Therefore, the data backend solution should be able to: + + - Store manual tests results which will be manually entered by the tester. + - Store automated tests results that will be fetched from the LAVA database. + - Have all results in the same place and format to simplify reporting and + manipulation of such data. + +# Data Usage + +The two main usage for test result data will be reports and statistics. + +## Test Reports + +This shows the tests results for all the applicable tests cases executed in a +specific image version. + +The tests reports are currently created weekly. They are created manually with +the help of some scripts and stored on the project wiki. + +New tools will need to be designed and developed to create reports once the +backend solution is implemented. + +These tools should be able to query the test data using the backend API to +produce reports both as needed and at regular intervals (weekly, monthly). + +## Test Statistics + +Accurate and up-to-date statistics are an important use case for the test data. + +Even though these statistics could be generated using different tools, there +may still exist the need for storing this data somewhere. For example, for every +release, besides the usual test report, producing a final `release report` +giving a more detailed overview of the whole release's history could be +generated. + +The backend should also make it possible to easily access the statistics data for +further processing, for example, to download it and manipulate the data using a +spreadsheet. + +# Data Format + +Tests data should ideally be in a well-known standard format that can be reused +easily by other services and tools. + +In this regard, data format is an important point for consideration when +choosing the backend since it will have a major impact on the project as it will +help to determine the infrastructure requirements and the tools which need to be +developed to interact with such data. + +# Version Management + +Tests cases and tests results should be versioned. + +Though this is more related to the way data will be used, the backend might also +have an impact on managing versions of this data. + +One of the advantages of versioning is that it will allow to link test cases to +tests results. + +# Data Storage Backends + +These sections give an overview of the different data backend systems that can +be used to implement a solution. + +## SQUAD + +SQUAD stands for `Software Quality Dashboard` and it is an open source test +management dashboard. + +It can handle tests results with metrics and metadata, but it offers no support +for test case management. + +SQUAD is a database with a HTTP API to manage tests result data. It uses an SQL +database, like MySQL or PostgreSQL, to store results. Its web frontend and API +are written using Django. + +Therefore, it would not require much effort to modify our current infrastructure +services to be able to push and fetch test results from SQUAD. + +Advantages: + + - Simple HTTP API: POST to submit results, GET to fetch results. + - Easy integration with all our existing infrastructure. + - Test results, metrics and metadata are in JSON format. + - Offers support for PASS/FAIL results with metrics, if available. + - Supports authentication token to use the HTTP API. + - Has support for teams and projects. Each team can have multiple projects + and each project can have multiple builds with multiple test runs. + - It offers group permissions and visibility options. + - It offers optional backend support for LAVA. + - Actively developed and upstream is open to contributions. + - It provides a web fronted to visualise test result data with charts. + - It is a Django application using a stable database system like PostgreSQL. + +Disadvantages: + + - It offers no built-in support for storing manual tests results. But it + should be straightforward to develop a new tool or application to submit + these test results. + - It has no support for test case management. This could be either added to + SQUAD or a different solution could be used. + - The web frontend is very simple and it lacks support for many visual charts. + It currently only supports very simple metrics charts for tests results. + +## Database Systems + +Since the problem is about storing data, a plain SQL database is also a valid +option to be considered. + +A reliable DB system could be used, for example PostgreSQL or MySQL, with an +application built on top of it to manage all test data. + +New database systems, such as CouchDB, can also offer more advanced features. +CouchDB is a NOSQL database that stores data using JSON documents. It also +offers a HTTP API that allows to send requests to manage the stored data. + +This database acts like a server that can interact with remote applications +through its HTTP API. + +Advantages: + + - Very simple solution to store data. + - Advanced database systems can offer an API and features to interact with + data remotely. + +Disadvantages: + + - All applications to manage data need to be developed on top of the database + system. + +## Version Control Systems + +A version control system (VCS), like Git, could be used to store all or part of +the test data. + +This approach would involve a design from scratch for all the components to +manage the data in the VCS, but it has the advantage that the solution can be +perfectly adapted to the project needs. + +A data format would need to be defined for all data and document types, +alongside a structure for the directory hierarchy within the repository. + +Advantages: + + - It fits the model of the project perfectly. All project members can easily + have access to the data and are already familiar with this kind of system. + - It offers good versioning and history support. + - It allows other tools, frameworks or infrastructure services to easily reuse + data. + - Due to its simplicity and re-usability, it can be easily adapted to other + projects and teams. + +Disadvantages: + + - All applications and tools need to be developed to interact with this + system. + - Although it is a simple solution, it depends on well defined formats for + documents and files to keep data storage in a sane state. + - It does not offer the usual query capabilities found in DB systems, so this + would need to be added in the applications logic. + +## ResultsDB + +ResultsDB is a system specifically designed for storage of test results. It can +store results from many different test systems and types of tests. + +It provides an optional web frontend, but it is built to be compatible with +different frontend applications, which can be developed to interact with the +stored data. + +Advantages: + + - It has a HTTP REST interface: POST to submit results, GET to fetch results. + - It provides a Python API for using the JSON/REST interface. + - It only stores test results, but it has the `concept` of test cases in forms + of namespaced names. + - It is production ready. + +Disadvantages: + + - The web frontend is very simple. It lacks metrics graphics and groups for + projects teams. + - The web frontend is optional. This could involve extra configurations and + maintenance efforts. + - It seems too tied to its upstream project system. + +# Proposal + +This section describes a solution using some of the backends discussed in the +previous section in order to solve the test data storage problem in the Apertis +project. + +This solution proposes to use a different type of storage backends for each type +of data. + +SQUAD will be used to store the tests result data (both manual and automated), +and a VCS system (Git is recommended) will be used to store the tests case data. +This solution also involves defining data formats, and writing a tool or a +custom web application to guide testers through entering manual test results. + +Advantages: + + - It is a very simple solution for all data types. + - It can be designed to perfectly suit the project needs. + - It can be easily integrated with our current infrastructure. It fits very + well into the current CI workflow. + - Storing test cases in a VCS will easily allow managing test case versions in + a very flexible way. + +Disadvantages: + + - Some tools and applications need to be designed and implemented from scratch. + - Format and standards need to be defined for test cases files. + - It is a solution only limited to data storage, further data processing tasks + will need to be done by other tools (for example, test case management + tasks, generating tests results statistics, and so on). + +## Test Results + +SQUAD will be used as the data storage backend for all the tests results. + +This solution to receive and store tests results perfectly fits into the +proposed mechanism to [close the CI loop][ClosingLoopDoc]. + +### Automated Test Results + +Automated tests results will be received in Jenkins from LAVA using the webhook +plugin. These results will then be processed in Jenkins and can be pushed into +SQUAD using the HTTP API. + +A tool needs to be developed to properly process the tests results received from +LAVA, though this data is in JSON format, which is the same format required by +SQUAD, so it should be very simple to write a tool to properly translate the +data to the correct format accepted by SQUAD. + +### Manual Test Results + +SQUAD does not offer any mechanism to input manual tests results. These tests +results will need to be manually entered into SQUAD. Nevertheless, it should be +relatively simple to develop a tool or application to submit this data. + +The application would need to receive the test data (for example, it can prompt +the user in some way to input this data), and then generate a JSON file that +will later be sent into SQUAD. + +The manual test results will need to be entered manually by the tester using the +new application or tool every time a manual test is executed. + +### File Format + +All the tests results will be in the standard SQUAD JSON format: + + - For automated tests, Jenkins will receive the test data in the JSON format + sent by LAVA, then this data needs to be converted to the JSON format + recognised by SQUAD. + + - For manual tests, a tool or application will be used to enter the test data + manually by the tester, and it will create a JSON format file that can also + be recognised by SQUAD. + +So it can be said that the final format for all tests results will be determined +by the SQUAD backend. + +The test data must be submitted to SQUAD as either file attachments, or as +regular POST parameters. + +There are four types of input file formats accepted by SQUAD: tests, metrics, +metadata and attachment files. + +The tests, metrics and metadata files should all be in JSON format. The +attachment files can be in any format (txt, png, and so on). + +All tests results, both for automated and manual tests will use any of these +file formats. Here are some examples of the different types of file formats: + +1) Tests file: it contains the test results in `PASS/FAIL` format. + +--- +{ + "test1": "pass", + "test2": "pass", + "testsuite1/test1": "pass", + "testsuite1/test2": "fail", + "testsuite2/subgroup1/testA": "pass", + "testsuite2/subgroup2/testA": "pass", +} +--- + +2) Metrics file: it contains the test results in metrics format. + +--- +{ + "test1": 1, + "test2": 2.5, + "metric1/test1": [1.2, 2.1, 3.03], + "metric2/test2": [200, 109, 13], +} +--- + +3) Metadata file: it contains metadata for the tests. It recognises some +special values and also accept new fields to extend the test data with any +relevant information. + +--- +{ + "build_url": "https://<url_build_origin>", + "job_id": "902", + "job_url": "https://<original_test_run_url>", + "job_status": "stable", + "metadata1": "metadata_value1", + "metadata2": "metadata_value2", + .... +} +--- + +4) Attachment files: these are any arbitrary files that can be submitted to +SQUAD as part of the test results. Multiple attachments can be submitted to +SQUAD during a single POST request. + +### Mandatory Data Fields + +The following metadata fields are mandatory for every test file submitted to +SQUAD, and must be included in the file: `source`, `image.version`, +`image.release`, `image.arch`, `image.board`, and `image.type`. + +The metadata file also needs to contain the list of test cases executed for the +tests job, and their types (manual or automated). This will help to identify the +exact test case versions that were executed. + +This metadata will help to identify the test data environment, and it +essentially maps to the same metadata available in the LAVA job definitions. + +This data should be included both for automated and manual tests, and it can be +extended with more fields if necessary. + +### Processing Test Results + +At the end, all test results (both manual and automated) will be stored in a single +place, the SQUAD database, and the data will be accessed consistently using the +appropriate tools. + +The SQUAD backend won't make any distinction between storing manual and automated +test results, but they will contain their respective type in the metadata so that +they can be appropriately distinguished by the processing tools and user +interfaces. + +Further processing of all the test data can be done by other tools that can use +the respective HTTP API to fetch this data from SQUAD. + +All the test result data will be processed by two main tools: + + 1) Automated Tests Processor + + This tool will receive test results in the LAVA JSON format and convert it + to the JSON format recognised by SQUAD. + + This should be developed as a command line tool that can be executed from + the Jenkins job receiving the LAVA results. + + 2) Manual Tests Processor + + This tool will be manually executed by the tester to submit the manual + test results and will create a JSON file with the test data which can then + be submitted to SQUAD. + +Both tools can be written in the Python programming language, using the JSON +module to handle the test result data and the `request` module in order to +submit the test data to SQUAD. + +### Submitting Test Results + +Test data can be submitted to SQUAD triggering a POST request to the specific +HTTP API path. + +SQUAD works around teams and projects to group the test data, so these are +central concepts reflected in its HTTP API. For example, the API path contains +the team and project names in the following form: + +--- +/api/submit/:team/:project/:build/:environment +--- + +Tools can make use of this API either using programming modules or invoking +command line tools like `curl` to trigger the request. + +An example using the `curl` command line tool to submit all the results in the +test file `common-tests.json` for the image release 18.06 with version +20180527.0 and including its metadata from the file `environment.json` would +look like this: + +--- +$ curl \ + --header "Auth-Token: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" \ + --form tests=@common-tests.json \ + --form metadata=@environment.json \ + https://squad.apertis.org/api/submit/apertis/18.06/20180527.0/amd64 +--- + +### Fetching Test Results + +Test data can be fetched from SQUAD, triggering a GET request to the specific +HTTP API path. + +Basically, all the different types of files pushed to SQUAD are accessible +through its HTTP API. For example, to fetch the tests results contained in the +tests file (previously submitted), a GET request call to the test file path can +be triggered like this: + +--- +$ curl https://squad.apertis.org/api/testruns/5399/tests_file/ +--- + +This retrieves the test results contained in the tests file of the test run ID +5399. In the same say, the metadata file for this test run can be fetched +with a call to the metadata file path like this: + +--- +$ curl https://squad.apertis.org/api/testruns/5399/metadata_file/ +--- + +The test run ID is generated by SQUAD to identify a specific test, and it can be +obtained by triggering some query calls using the HTTP API. + +Tools and applications (for example, to generate test reports and project +statistics) can conveniently use this HTTP API either from programming modules +or command line tools to access all the test data stored in SQUAD. + +## Test Cases + +Git will be used as the data storage backend for all the test cases. + +A data format for test cases needs to be defined, among with other standards +such as, for example, the directory structure in the repository, as well as a +procedure to access and edit this data might be necessary. + +### File Format + +A well defined file format is required for the test cases. + +The proposed solution is to reuse the LAVA test definition file format for all +test cases, both for manual and automated tests. + +The LAVA test definitions files are YAML files that contain the instructions to +run the automated tests in LAVA and they are already stored in the automated +tests Git repository. + +In essence, the YAML format would be extended to add all the required test case +data to the automated tests definition files and new test definition YAML files +would be created for manual test cases, which would follow the same format as +for the automated test cases. + +In this way, all tests cases, both for automated and manual tests, will be +available in the same YAML format. + +The greatest advantage of this approach is that it will avoid the current issue +of test case instructions differing from the executed steps in automated tests, +since the test case and the definition file will be the same document. + +The following examples are intended to give an idea of the file format for the +manual and automated test cases. They are not in the final format and only serve +as an indicator of format that will be used. + +An example of the automated test case file for the `librest` test. This test +case file will be executed by LAVA automatically: + +--- +metadata: + format: "Lava-Test-Shell Test Definition 1.0" + name: librest + type: unit-tests + exec-type: automated + target: any + image-type: any + description: "Run the unit tests that ship with the library against the running system." + maintainer: "Luis Araujo <luis.araujo@collabora.co.uk>" + + pre-conditions: + - "Ensure you have the development repository enabled in your sources.list and you have recently run apt-get update." + - "Ensure Rootfs is remounted as read/write" + - sudo mount -o remount,rw / + +install: + deps: + - librest-0.7-tests + +run: + steps: + - common/run-test-in-systemd --user=user --timeout=900 --name=run-test env DEBUG=2 librest/automated/run-test.sh + +parse: + pattern: ^(?P<test_case_id>[a-zA-Z0-9_\-\./]+):\s*(?P<result>pass|fail|skip|unknown)$ + +expected: + - "PASSED or FAILED" +--- + +An example of the manual test case file for the `webkit2gtk-aligned-scroll` +test. This test case can be manually read and executed by the tester, but +ideally a new application should be developed to read this file and guide the +tester through each step of the test case: + +--- +metadata: + format: "Manual Test Definition 1.0" + name: webkit2gtk-aligned-scroll + type: functional + exec-type: manual + target: any + image-type: any + description: "Test that scrolling is pinned in a given direction when started mostly towards it." + maintainer: "Luis Araujo <luis.araujo@collabora.co.uk>" + + resources: + - "A touchscreen and a mouse (test with both)." + + pre-conditions: + - "Ensure you have the development repository enabled in your sources.list and you have recently run apt-get update." + - "Ensure Rootfs is remounted as read/write" + - sudo mount -o remount,rw / + +install: + deps: + - webkit2gtk-testing + +run: + steps: + - GtkClutterLauncher -g 400x600 http://gnome.org/ + - "Try scrolling by starting a drag diagonally" + - "Try scrolling by starting a drag vertically" + - "Try scrolling by starting a drag horizontally, ensure you can only pan the page horizontally" + +expected: + - "When the scroll is started by a diagonal drag, you should be able to pan the page freely" + - "When the scroll is started by a vertical drag, you should only be able to pan the page vertically, + regardless of if you move your finger/mouse horizontally" + - "When the scroll is started by a horizontal drag, you should only be able to pan the page horizontally, + regardless of if you move your finger/mouse vertically" + +example: + - video: https://wiki.apertis.org/static/Aligned-scroll.ogv + +notes: + - "Both mouse and touchscreen need to PASS for this test case to be considered a PASS. + If either does not pass, then the test case has failed." +--- + +### Mandatory Data Fields + +A test case file should at least contain the following data fields for both the +automated and manual tests: + +--- +format: This is used to identify the format version. +name: Name of test case. +type: This could be used to define a series of test case types (functional, sanity, + system, unit-test). +exec-type: Manual or automated test case. +image-type: This is the image type (target, minimal, ostree, development, SDK). +image-arch: The image architecture. +description: Brief description of the test case. +priority: low, medium, high, critical. +run: Steps to execute the test. +expected: The expected result after running the test. +--- + +The test case file format is very extensible and new fields can be added as +necessary. + +### Git Repository Structure + +A single Git repository can be used to store all test cases files, both for +automated and manual tests. + +Currently, LAVA automated tests definitions are located in the git repository +for the project tests. This repository contains all the scripts and tools to run +tests. + +All tests cases could be placed inside this git repository. This has the great +advantage that both tests instructions and tests tools will be located in the +same place. + +The git repository will need to be cleaned and organised to adapt it to contain +all the available tests cases. A directory hierarchy can be defined to organise +all test cases by domain and type. + +For example, the path `tests/networking/automated/` will contain all automated +tests for the networking domain, the path `tests/apparmor/manual/` will contain +all manual tests for the apparmor domain, and so on. + +Further tools and scripts can be developed to keep the git repository hierarchy +structure in a sane and standard state. + +### Updates and Changes + +Since all tests cases will be available from a git repository, and they are +plain YAML files, they can be edited like any other file from that repository. + +At the lowest level, the tester or developer can use an editor to edit these +files, though it is also possible to develop tools or a frontend to help with +editing and at the same time enforce a certain standard on them. + +### Execution + +The test cases files will be in the same format both for automated and manual +tests, though the way these will be executed are different. + +Automated test cases will continue to be automatically executed by LAVA, and for +manual test cases a new application could be developed that can assist the +tester going through the steps from the tests definition files. + +This application can be a tool or a web application that, besides guiding the +tester through each step of the manual test definition file, will also collect +the test results and convert them to JSON format, which can then be sent to the +SQUAD backend using the HTTP API. + +In this way, both types of test cases, manual and automated, would follow the +same file format, located in the same git repository, they will be executed by +different applications (LAVA for automated tests, a new application for manual +tests), and both types of tests results will conveniently use the same HTTP API +to be pushed into the SQUAD data storage backend. + +### Visualisation + +Though a Git repository offers many advantages to manage the test cases files, +it is not a friendly option for the user to access and read test cases from it. + +One solution is to develop an application that can render these test cases files +from the git repository into HTML or other format and display them into a server +where they can be conveniently accessed by users, testers and developers. + +In the same way, other tools to collect statistics, or generate other kinds of +information about test cases can be developed to interact with the git +repository to fetch the required data. + +## Test Reports + +A single application or different ones can be developed to generate different +kinds of report. + +These applications will need to trigger a GET request to the SQUAD HTTP API to +fetch the specific tests results (as explained in the +[Fetching Test Results](#fetching-test-results) section) and generate the +report pages or documents using that data. + +These applications can be developed as command line tools or web applications +that can be executed periodically or as needed. + +## Versioning + +Since all the test cases both for manual and automated tests will be available +as YAML files from a Git repository, these files can be versioned and link to +the corresponding tests runs. + +Tests case groups will be versioned using Git branches. For every image release, +the test cases repository will be branched with the same version (for example +18.03, 18.06, and so on). This will match the whole group of test cases against +an image release. + +A specific test case can also be identified using the `HEAD` commit of the +repository from which it is being executed. It should be relatively simple to +retrieve the `commit` id from the git repository during test execution and add +it to the metadata file that will be sent to SQUAD to store the test results. In +this way, it will be possible to locate the exact test case version that was +used for executing the test. + +For automated tests cases, the commit version can be obtained from the LAVA +metadata, and for manual tests cases, the new tool executing the manual tests +should take care of retrieving the commit id. Once the commit id is available, +it should be added to the JSON metadata file that will be pushed along with the +tests results data to SQUAD. + +## SQUAD Configuration + +Some configuration is required to start using SQUAD in the project. + +Groups, teams and projects need to be created and configured with the correct +permissions for all the users. Depending on the implementation, some of these +values will need to be configured every quarter (for example, if new projects +should be created for every release). + +Authentication tokens need to be created by users and tools required to submit +tests results using the HTTP API. + +## Workflow + +This section describes the workflow for each of the components in the proposed +solution. + +### Automated Test Results + + - Automated tests are started by the Jenkins job responsible for triggering + tests. + - The Jenkins job waits for automated tests results using a webhook plugin. + - Once test results are received in Jenkins, these are processed with the tool + to convert the test data into SQUAD format. + - After the data is in the correct format, it is sent by Jenkins to SQUAD + using the HTTP API. + +### Manual Test Results + + - Tester manually executes the application to run manual tests. + - This application will read the instructions from the manual test definition + files in the git repository and will guide the testers through the different + test steps. + - Once the test is completed, the tester enter its results into the application. + - A JSON file is generated with these results in the format recognised by SQUAD. + - This same application or a new one could be used by the tester to send the + test results (JSON file) into SQUAD using the HTTP API. + +### Test Cases + + - Test case files can be edited using any text editor in the Git repository. + - A Jenkins job could be used to periodically generate HTML or PDF pages from + the test case files and make them available from a website for easy and + convenient access by users and testers. + - Test cases will be automatically versioned once a new branch is created in + the git repository which is done for every release. + +### Test Reports + + - Reports will be generated either periodically or manually by using the new + reporting tools. + - The SQUAD frontend can be used by all users to easily check test results + and generate simple charts showing the trend for test results. + +## Requirements + +This gives a general list of the requirements needed to implement the proposed +solution: + + - The test case file format needs to be defined. + - A directory hierarchy needs to be defined for the tests Git repository to + contain the test case files. + - Develop tools to help work around the test cases files (for example, syntax + and format checker, repository sanity checker). + - Develop tool to convert tests data from LAVA format into SQUAD format. + - Develop tool to push tests results from Jenkins into SQUAD using HTTP API. + - Develop application to guide execution of manual test cases. + - Develop application to push manual test results into SQUAD using HTTP API + (this can be part of the application to guide manual test case execution). + - Develop tool or web application to generate weekly test report. + +## Deployment Impact + +All the additional components proposed in this document (SQUAD backend, new tools, +web application) are not resource intensive and do not set any new or special +requirements on the hosting infrastructure. + +The instructions for the deployment of all the new tools and services will be made +available with their respective implementations. + +Following is a general overview of some important deployment considerations: + + - SQUAD will be deployed using a Docker image. SQUAD is a Django application, + and using a Docker image makes the process of setting up an instance very + straightforward with no need for special resources, packaging all the required + software in a container that can be conveniently deployed by other projects. + + The recommended setup is to use a Docker image for the SQUAD backend and + another one for its PostgreSQL database. + + - The application proposed to execute manual tests and collect their results will + serve mainly as an extension to the SQUAD backend, therefore the requirements + will also be in accordance with the SQUAD deployment from an infrastructure + point of view. + + - Other new tools proposed in this document will serve as the components to + integrate all the workflow of the new infrastructure, so they won't require + special efforts and resources beyond the infrastructure setup. + +# Limitations + +This document only describes the test data storage issues and proposes a +solution for those issues along with the minimal test data processing required +to implement the reporting and visualisation mechanisms on top of it. It does +not cover any API in detail and only gives a general overview of the required +tools to implement the proposed solution. + +# Links + +* Apertis Tests + +https://gitlab.apertis.org/infrastructure/apertis-tests + +* Weekly Tests Reports Template Page + +https://wiki.apertis.org/QA/WeeklyTestReport + +* SQUAD + +https://github.com/Linaro/squad +https://squad.readthedocs.io/en/latest/ + +* ResultsDB + +https://fedoraproject.org/wiki/ResultsDB + +* CouchDB + +http://couchdb.apache.org/ + +[ClosingLoopDoc]: closing-ci-loop.md + diff --git a/content/designs/testcase-dependencies.md b/content/designs/testcase-dependencies.md new file mode 100644 index 0000000000000000000000000000000000000000..e96133fc7547c232a2a69a865ba1f8a7fa554808 --- /dev/null +++ b/content/designs/testcase-dependencies.md @@ -0,0 +1,178 @@ +--- +title: Test case dependencies on immutable rootfs +short-description: Ship test case dependencies avoiding changes to the rootfs images. +authors: + - name: Denis Pynkin + - name: Emanuele Aina + - name: Frederic Dalleau +--- + +# Test case dependencies on immutable rootfs + +## Overview + +Immutable root filesystems have several security and maintainability advantages, +and avoiding changes to them increases the value of testing as the system under +test would closely match the production setup. + +This is fundamental for setups that don't have the ability to install packages +at runtime, like OSTree-based deployments, but it's largely beneficial for +package based setups as well. + +To achieve that, tests should then ship their own dependencies in a +self-contained way and not rely on package installation at runtime. + +## Possible solutions + +For adding binaries into OStree-based system, the following approaches are possible: +- Build the tests separately on Jenkins and have them run from + `/var/lib/tests`; +- Create a Jenkins job to extract tests from their .deb packages + shipped on OBS and to publish the results, so they can be run from + `/var/lib/tests`; +- Use layered filesystem for binaries install on top of testing image; +- Publish a separate OStree branch for tests created at build time from the same OS + pack as image to test; +- Produce OStree static deltas at build time from the same OS pack as + image to test with additional packages/binaries installed; +- Create mechanism for `dpkg` similar to RPM-OStree project* to allow installation + of additional packages in the same manner as we have for now. + + * Creation of `dpkg-ostree` project will use a lot of time and human + resources due to changes in `dpkg` and `apt` system utilities. + +## Overview of applicable approach + +### Rework tests to ship their dependencies in '/var/lib/tests` + +Build the tests separately and have them run from `/var/lib/tests` or +create a Jenkins job to extract tests from their `.`deb packages to `/var/lib/tests` + +#### Pros: +- 'clean' testing environment -- the image is not polluted by additions, so + tests and dependencies have no influence on SW installed on image +- possibility to install additional packages/binaries in runtime + +#### Cons: +- some binaries/scripts expect to find the dependencies in standard places -- + additional changes are needed to create the directory with relocated test + tools installed +- we need to be sure if SW from packages works well from relocated directory +- additional efforts are needed to maintain 2 versions of some packages and/or + packaging for some binaries/libraries might be tricky +- can't install additional packages without some preparations in a build time + (save dpkg/apt-related infrastructure or create a tarball from pre-installed SW) +- possible versions mismatch between SW installed into testing image and SW + from tests directory +- problems in dependencies installation are detected only in runtime + +### OStree branch or static deltas usage + +Both approaches are based on native OStree upgrade/rollback mechanism -- only +transport differs. + +#### Pros: +- test of OStree upgrade mechanism is integrated +- easy to create and maintain branches for different groups of tests -- so + only SW needed for the group is installed during the tests +- developer can obtain the same environment as used in LAVA in a few `ostree` commands +- problems with installation of dependencies for the test are detected in a buildtime +- the original image do not need to have `wget`, `curl` or any other tool for + download -- `ostree` tool have own mechanism for download needed commit from test + branches +- with OStree static deltas we are able to test 'offline' upgrades without + network access +- saves a lot of disk space for infrastructure due OStree repository usage + +#### Cons: +- 'dirty' testing environment -- the list of packages is not the same as we have + in testing image; e.g. system places for binaries and libraries are used by + additional packages installed, as well as changes in system configuration + might occur (the same behavior we have in current test system with installation + of additional packages via `apt`) +- not possible to install additional packages at runtime +- additional branch(es) should be created at build time +- reboot is needed to apply the test environment +- in case of OStree static deltas -- creation of delta is an expensive + operation in terms of time and resources usage + +### OStree overlay + +Overlay is a native option provided by `ostree` project, re-mounting "/usr" +directory in R/W mode on top of 'overlayfs'. This allows to add any software +into "/usr" but changes will disappear just after reboot. + +#### Pros: +- limited possibility to install additional packages at runtime (with saved + state of `dpkg` and `apt`) -- merged "/usr" is desirable +- possibility to copy/unpack prepared binaries directly to "/usr" directory +- able to use OStree pull/checkout mechanism to apply overlay + +#### Cons: +- dirty testing environment -- the list of packages is not the same as we have + in testing image +- OStree branch should contain only "/usr" if used. In other case need to use + foreign for OStree methods to store binaries and/or filesystem tree +- can't apply additional software without some preparations in a build time + (save dpkg/apt-related infrastructure, create a tarball from pre-installed + SW or create an ostree branch) +- possible versions mismatch between SW installed into testing image and SW + from tests directory +- problems in dependencies installation are detected only in runtime + +## Overall proposal + +The proposal consist of a transition from a full apt based test mechanism to +a more independant test mechanism. + +Each tests will be pulled of `apertis-tests` and moved to its own git +repository. During the move, the test will be made relocatable, and its +dependencies will be reduced. + +Dependencies that could not be removed would be added to the test itself. + +At any time, it would still be possible to run the old tests on the non OSTree +platform. The new test that have already be transitionned could run on both +OSTree and apt platforms. + +The following steps are envisioned. + +### Create separate git repository for each test + +In order to run the tests on LAVA, the use of git is recommended. +LAVA is already able to pull test definitions from git, but it can pull only one +git repository for each test. + +To satisfy this constraint, each test definition, scripts, and +dependencies must be grouped in a single git repository. + +In order to run the tests manually, GitLab is able to dynamically build a +tarball with the content of a git repository at any time. The tarball can be +retrieved at a specific URL. +By specifying a branch other than master, a release-specific test can be +generated. +A tool such as `wget` or `curl` can be used, or it might be necessary to +download the test tarball from a host, and copy it to the device under test +using `scp`. + +### Reduce dependencies + +To minimize impact of the tests dependencies on the target environment, +some dependencies need to be dropped. For example, Python requires several +megabytes of binaries and dependencies itself, so all the Python scripts will +need to be rewritten using Posix shell scripts or compiled binaries. + +For tests using data files, the data should be integrated in the git repository. + +### Make test relocatable + +Most of the tests rely on static path to find binaries. It is straightforward +to modify a test to use a custom `PATH` instead of static one. This custom +`PATH` would point to a subdirectory in the test repository itself. + +This applies to dependencies which could be relocated, such as statically +linked binaries, scripts, and media files. + +For the test components that might not be ported easily, such as For example +AppArmor profiles that are designed to work on binaries at fixed locations, a +case-by-case approach needs to be taken. diff --git a/content/designs/text-to-speech.md b/content/designs/text-to-speech.md new file mode 100644 index 0000000000000000000000000000000000000000..5f8f1f1005479da57846d5d63cf34243f5740d18 --- /dev/null +++ b/content/designs/text-to-speech.md @@ -0,0 +1,1128 @@ +--- +title: Text To Speech +short-description: Documents possible approaches to designing an API for Text To Speech services + (unimplemented) +authors: + - name: Philip Withnall +--- + +# Text To Speech + +## Introduction + +This documents possible approaches to designing an API for text to +speech (TTS) services for an Apertis system in a vehicle. + +This document proposes an API for the text to speech service in [][Appendix: A suggested TTS API]. +This API is not finalised, and is merely a suggestion. It may be +refined in future versions of this document, or when implementation is +started. + +The major considerations with a TTS API are: + + - Simple API for applications to use + - Swappable voices through the application bundling system and + application store + - Output priorities controlled by the same set of audio manager + policies which control other application audio output + +## Terminology and concepts + +### Text to speech (TTS) + +*Text to speech* (TTS) is the process of converting a string of text +into spoken words in the user’s language, to be outputted as an audio +stream. + +### Voice + +In the context of TTS, a *voice* is an engine for producing spoken +words. As with the conventional meaning of the word, the voice may have +certain characteristics, such as gender, regionality or manners of +speech. The most important quality of a voice is its understandability +and correctness of pronunciation. + +## Use cases + +A variety of use cases for application usage of TTS services are given +below. Particularly important discussion points are highlighted at the +bottom of each use case. + +### News application + +The user has installed a news application, and wants it to read the +headlines and articles aloud as they drive. If they are waiting in a +traffic queue, they want to be able to quickly find the current +paragraph in the article on-screen so they can read it themselves to +speed things up. + +### Back in a news application + +The user has a news reader application open on a specific article, which +is being read aloud. The user presses the back button to close the +article and return to the list of headlines. TTS output needs to stop +for that article. If an audio source was playing before the user started +reading the article (for example, some music), its playback may be +resumed where it was paused. + +### New e-mail notification + +The user’s e-mail client is reading an e-mail aloud to the user, +scrolling the e-mail as reading progresses. A new e-mail arrives, which +causes a ‘new e-mail’ notification to be sent to the TTS system. + +The OEM wants control over the policy of how the two TTS requests are +played: + + - The system could pause reading the original e-mail, read the + notification, then resume reading the original e-mail; or + - it could pause reading the original e-mail, read the notification, + then *not* resume reading the original e-mail; or + - it could continue reading the original e-mail at a lower volume, and + read the notification louder mixed over the top. + +The OEM wants these policies to not be overridable by any +application-specific policy such as the ones described in +[][New e-mail notification then going back], [][New meeting notification then cancelled], +[][Incoming phone call]. + +### New e-mail notification then going back + +The user’s e-mail client is reading an e-mail aloud to the user, +scrolling the e-mail as reading progresses. A new e-mail arrives, which +causes a ‘new e-mail’ notification to be sent to the TTS system. This +pauses reading the original e-mail and starts reading the notification, +as notifications have a higher priority than reading e-mails. + +While the notification is being read, the user presses the ‘back’ button +to go back to their list of e-mails. This should cancel reading out the +old e-mail (which is currently paused), but should not cancel the ‘new +e-mail’ notification, which is still being played. + +### New meeting notification then cancelled + +The user’s e-mail client is reading them an invitation to a meeting. +While reading the invitation, the meeting is cancelled by the organiser, +and a notification is displayed informing the user of this. This +notification is read by the TTS system, interrupting it reading the +original meeting invitation. Once the notification has finished being +read, the e-mail client should not resume reading the original +invitation. + +### Incoming phone call + +The user’s e-mail client is reading an e-mail aloud to the user. +Part-way through reading, a phone call is received. TTS output for the +e-mail needs to be automatically paused while the phone ringtone is +played and the call takes place. Once the call has finished, the e-mail +application may want to continue reading the user’s e-mail aloud, or may +cancel its output. + +### Voice installed with the SDK + +A developer wants to develop an application using the SDK with TTS +functionality, and needs to test it using a voice available in the SDK. + +### Installable voice bundle + +A user does not like how the default TTS voice for their vehicle sounds, +and wishes to change it to another voice which they can download from +the Apertis application store. They wish this new voice to be used by +default in future. + +### Voice backend in the automotive domain + +An OEM may wish to provide a proprietary TTS voice as part of the +software in their automotive domain. They want this voice to be used as +the default for TTS requests from the CE domain as well. + +### Installable languages + +A vehicle has already been released in various countries, but the OEM +wishes to expand into other countries. They need to add support for +additional languages to the TTS system. + +### Voice configuration + +The user finds that the TTS system reads text too slowly for them, and +they wish to speed it up. They edit their system preferences to increase +the speed, and want this to take effect across all applications which +use TTS. + +### Per-request emphasis + +A news reader application needs to differentiate between TTS output for +article headings and bodies. It wishes to read headings slightly louder +and more slowly than it reads bodies. However, the application must not +be allowed to make TTS requests so loud that they distract the driver. + +### Non-phonetic place names + +The navigation application is reading turn-by-turn route guidance aloud, +including place names. Various place names are not pronounced +phonetically, and the navigation system needs to make sure the TTS +system pronounces them correctly. + +### Driving abroad + +When driving abroad, the navigation application needs to read the +instructions “Turn left at the next junction, signposted ‘Paris nord’.â€, +a sentence which contains both English and French. The speech in each +language should be pronounced using the correct pronunciation rules for +that language. + +### Multiple concurrent TTS requests + +The user is listening to their e-reader read a book aloud using TTS, +while they are driving and using the audio turn-by-turn instructions +from the navigation application. Whenever the navigation application +needs to read an instruction, the e-reader output should be temporarily +paused or its volume reduced, and resumed after the navigation +instruction has been read, so that the user doesn’t get confused. + +*It is understood that the current quality of TTS implementations is not +sufficient to read an e-book to the user without causing them +significant discomfort. This use case is intended to demonstrate the +need for the system to handle multiple pending TTS requests. E-reader +output may become possible in the future.* + +### Permissions to use TTS API + +The user has installed a game application for their passenger to play, +and wants to be sure that it will not start reading instructions aloud +using the TTS service while they are driving. They want to disallow the +application permission to use the TTS API — either entirely, or just +while driving. + +### Multiple output speakers + +A vehicle has a single main field speaker, plus two sets of headphones. +Each set of headphones is associated with a different head unit. TTS +audio which pertains to the entire system should be output through all +three speakers; TTS audio which pertains to an application only on one +of the head units should only be output through that head unit’s +headphones. + +### Custom TTS implementation in an application + +An application developer wants to port an existing application from +another platform to Apertis. The application is a large one, and has its +own tightly integrated TTS system which would output directly to the +audio manager. This must be possible. + +## Non-use-cases + +The following use cases are not in scope to be handled by this design — +but they may be in scope to be handled by other components of Apertis. +Further details are given in each subsection below. + +### Accessibility for users with reduced vision + +While TTS is often used in software to provide accessibility for users +with reduced vision, who otherwise cannot see the graphical UI clearly, +that is not a goal of the TTS system in Apertis. It is intended to +reduce driver distraction by reducing the need for the driver to look at +the graphical UI, rather than making the UI more accessible. + +## Requirements + +### Basic TTS API + +Implement a basic TTS API with support for speaking text; and pausing, +resuming and cancelling specific requests. + +See [][News application], [][Back in a news application], +[][New e-mail notification then going back]. + +### Progress signalling API + +The TTS system must be able to signal an application as output +progresses through the current request. Signals must be supported for +output start and end, and may be supported for per-word progress through +the text. Signals must also be supported for pausing and resuming +output. + +These signals are intended to be used to update the client application’s +UI to correspond to the output progress. For example, if a notification +is being read aloud, the notification window should be hidden when, and +only when, output is finished. + +See [][News application], [][New e-mail notification then going back] + +### Output policy decided by audio manager + +The policy deciding which TTS requests are played, which are paused, +when they are resumed, and which are cancelled altogether, must be +determined by the system’s audio manager. + +An application may be able to implement its own policy (for example, to +always cancel a TTS request if it is paused), but it must not be able to +override the audio manager’s policy, for example by preventing a request +from being paused, or by increasing the priority of a request so it is +played in preference to another. + +If the audio manager corks a TTS output stream (for example, if all +audio output needs to be stopped in order to handle a phone call), the +TTS daemon must pause the corresponding client application request, and +notify the application. + +Once the output stream is uncorked, the client application request must +be resumed, and the application notified, unless the application has +cancelled that request in the meantime. By cancelling the request in the +signal handler, a client application can ensure that TTS output is not +resumed after the stream would have been uncorked, allowing for various +resumption policies to be implemented. + +See [][New e-mail notification then going back], +[][New meeting notification then cancelled], [][Incoming phone call]. + +### Output streams are mixable + +Multiple TTS audio streams from within a single application, and from +multiple applications, must be mixable by the audio manager, to allow +implementing the policy of lowering the volume of one stream while +playing a more important stream over the top. + +See [][New e-mail notification]. + +### Runtime-swappable voice backends + +The TTS system must support different voice backends. Only one backend +has to be active at once, but backends must be swappable at runtime if, +for example, the user installs a new voice from the store, or if the OEM +installs a voice backend supporting more languages (requirement 5.6). + +TTS requests queued or being output at the time a new voice backend is +selected should continue using the old voice. New TTS requests should +use the new voice. + +See [][Voice installed with the SDK], [][Voice configuration]. + +### Installable voice backends + +The user must be able to install additional voices from the Apertis +application store; and an OEM must be able to install additional voices +before sale of a vehicle to support additional languages. These voices +must be available to choose as the default for all TTS output. + +See [][Installable voice bundle], [][Installable languages]. + +### Default SDK voice backend + +A voice backend must be shipped with the SDK by default, to allow +application development against the TTS system. + +See [][Voice installed with the SDK]. + +### Voice backends are not latency sensitive + +Some vehicles may have a TTS voice backend implemented in the automotive +domain, which means all TTS requests would be carried over the +inter-domain communications link, incurring some latency. The TTS system +must not be sensitive to this latency. + +See [][Voice backend in the automotive domain]. + +### System-wide voice configuration + +The system must have a single default voice, which is used for all TTS +output. The configuration settings for this voice must be settable in +the system preferences, but not settable by individual applications. + +Specific preferences, such as volume or speech rate, may be settable on +a per-application basis to modify the system-wide defaults if needed. +These modifications must have limited ability to distract the driver. +For example, an application may apply a modifier to the volume of +between 0.8 and 1.2 times the current system-wide output volume. + +See [][Voice configuration]. + +### Pronunciation data for non-phonetic words + +There must be a way for applications to provide pronunciations for +non-phonetic words. This may be implemented as a static list of +overrides for certain words, or may be implemented as a runtime API. +Pronunciations must be associated with a specific language, so that the +correct pronunciation is used for the user’s current system language. If +no more suitable pronunciation is available for a word, the system must +use the current voice’s default pronunciation. + +See [][Non-phonetic place names], [][Driving abroad]. + +### Per-request language support + +The TTS system must support specifying the language of each request (or +even parts of a request), so that requests which contain text in +multiple languages (for example ‘Turn left onto Rue de Rivoli’) are +pronounced correctly. + +The system language should be used by default if the application doesn’t +specify a language, or if the specified language is not supported by the +current voice. + +See [][Driving abroad]. + +### Support for concurrent requests + +The TTS system must support accepting TTS output requests from multiple +applications concurrently, and queueing them for output sequentially. + +See [][Multiple concurrent TTS requests]. + +### Prioritisation for concurrent requests + +The TTS system must support prioritising TTS requests from certain +applications over requests from other applications, according to the +urgency of the output (for example, turn-by-turn navigation instructions +are more urgent than news reading). Similarly, it must support +prioritising requests from within a single application. + +Prioritisation must be performed on a per-request basis, as one +application may make requests which are high and low priority. Note that +this does not necessarily mean that the priority policy is implemented +in the TTS system; it may be implemented in the audio manager. This +requirement simply means that the TTS API must expose support for +prioritising requests, and must forward that prioritisation information +as ‘hints’ to whichever component implements the priority policy. + +See [][Multiple concurrent TTS requests]. + +### Output routing policy + +On high-end vehicles, there may be multiple output speakers, attached to +different head units. The audio manager must be able to associate each +TTS request with an application so that it can determine which speaker +or speakers to play the audio on. + +See [][Multiple output speakers]. + +### Permission for using TTS system + +Applications must only be allowed to use the TTS system if they are +allowed to output audio. This is subject to the application’s +permissions from its manifest, and may additionally be subject to the +user’s preferences for audio output. The user may be able to temporarily +disable audio output for a specific application. + +If any TTS-specific permissions are implemented in the system, it must +be understood that an application may circumvent them by embedding its +own TTS system (or by playing pre-recorded audio files, for example). + +See [][Permissions to use TTS API], +[][Custom TTS implementation in an application]. + +## Existing text to speech systems + +This chapter describes the approaches taken by various existing systems +for allowing applications to use TTS services, because it might be +useful input for Apertis’ decision making. Where available, it also +provides some details of the implementations of features that seem +particularly interesting or relevant. + +### Android + +[Android provides a text to speech API][Android-tts] for converting text to audio +to output, or to audio in a file. + +It provides an API for matching pieces of text with custom pre-recorded +sounds (which it calls ‘earcons’), for the purpose of embedding custom +noises (such as ticking noises) into TTS output, or for providing custom +pronunciations for the text. + +It supports voices which support different languages, and provides the +union of those languages to the developer, who may specify which +language the provided text is in. + +The user controls the preferences for the voice, apart from pitch and +speech rate, which applications may set individually. + +For determining the progress of the TTS engine through an utterance, the +API provides a callback function which is called on starting and ending +audio output. + +### iOS + +[iOS provides TTS support through its speech synthesiser API][iOS-tts]. In +this API, text to be spoken is passed to a new utterance object, which +allows its voice, volume, speech rate and pitch to be modified. The +utterance is then passed to the service, which queues it up to be +spoken, or starts speaking it if nothing else is queued. Methods on the +service allow output to be paused, cancelled or resumed. When pausing +speech, the API provides the option to pause immediately, or after +finishing speaking the current word. + +Progress through speaking an utterance can be tracked using a delegate, +which receives calls when speech starts, stops, is paused, resumes, and +for each word in the text as it is spoken (intended for the purposes of +highlighting words on-screen). + +It is worth noting that iOS is recognised as highly competent in the +field of accessibility for the blind or partially sighted, partly due to +its well designed TTS system. + +### Previous eCore TTS API + +The TTS API previously exposed by eCore gave a method to speak a given +string of text, a method to stop speaking, and one to check whether +speech was currently being output. It gave the choice of two voices, but +no other options for configuring them. It provided two signals for +notifying of audio output starting and ending. + +### speech-dispatcher + +[speech-dispatcher] is an abstraction layer over multiple TTS voices. +It uses a client–server architecture, where multiple clients can connect +and send text to the server to be outputted as audio. The protocol used +between clients and the server is the [Speech Synthesis Interface +Protocol], a custom text-based protocol operated over a Unix domain +socket. + +Prioritisation between text from different clients is supported, but +clients are not strictly separated by the server: one client can control +the settings and output for another client. + +The client library has C and Python APIs. The C API is pure C, and is +not GLib-based. The backend supports a few different voices (see [][TTS voices]): +Festival, espeak, pico, and a few proprietary systems. Writing a +new voice backend, to connect an existing external voice engine to +speech-dispatcher, is not a major task. + +The system supports ‘sound icons’ which associate a sound file with a +given text string, and allow that sound to be played when that string is +found in input. + +The settings allow control over the output language, whether to speak +punctuation characters, the speech rate, pitch, and volume. + +Speech output can be paused, resumed and cancelled once started. The API +supports notifying when output is started, stopped, and when pre-defined +‘index marks’ are reached in the input string. + +Backends for speech dispatcher are run as separate processes, +communicating with the daemon via stdin and stdout. They have individual +configuration files. + +### TTS voices + +Here is a brief comparative evaluation of various TTS engines and voices +which are available already. + +#### espeak + + - Supports many languages (importantly, non-Latin languages) + - Sounds robotic + - Can be used with mbrola voices to make it more natural; not + supported very well by speech-dispatcher + ([http://espeak.sourceforge.net/mbrola.html](http://espeak.sourceforge.net/mbrola.html)) + - Already packaged for Ubuntu (as are mbrola voices) + - [http://espeak.sourceforge.net/](http://espeak.sourceforge.net/) + +#### Festival + + - Sounds less robotic than espeak, but still quite robotic (example + here: + [http://tts.speech.cs.cmu.edu:8083/](http://tts.speech.cs.cmu.edu:8083/)) + - A bit slower + - Already packaged for Ubuntu + - Supports 3 languages (English, Spanish and + Welsh) + - [http://www.cstr.ed.ac.uk/projects/festival/](http://www.cstr.ed.ac.uk/projects/festival/) + +#### pico + + - License: Apache License v2 + - By SVOX; used in Android + - Written in Java; C API available in picoapi.h + - Supports 37 languages (importantly, non-Latin languages) + - Sounds very good (example here: + [https://svoxmobilevoices.wordpress.com/demos/](https://svoxmobilevoices.wordpress.com/demos/)) + - Not as well tested through + speech-dispatcher + - [https://en.wikipedia.org/wiki/SVOX](https://en.wikipedia.org/wiki/SVOX) + - Publicly available source; + [https://android.googlesource.com/platform/external/svox/](https://android.googlesource.com/platform/external/svox/) + - Already packaged for Debian and Ubuntu + - As this is a component of Android, we are not sure about the + openness of the development practices, and whether it’s possible to + get involved in them. + - It’s certainly possible to file bugs about the packaging with the + [Debian bug tracker][pico-tracker], but that won’t necessarily help for bugs in + the source itself. + +#### acapela + + - Non-FOSS + - Best quality + - [http://www.acapela-group.com/](http://www.acapela-group.com/) + +#### Nuance + + - Non-FOSS + - Has been used previously in + eCore + - [http://www.nuance.com/for-business/text-to-speech/vocalizer/index.htm#demo](http://www.nuance.com/for-business/text-to-speech/vocalizer/index.htm#demo) + +## Approach + +Based on the above research ([][Existing text-to-speech systems]) +and [][Requirements], we +recommend the following approach as an initial sketch of a text to +speech system. A suggested API for the TTS service is given in [][Appendix: A suggested TTS API]. + +### Overall architecture + +As TTS output from an application is essentially another audio stream, +and no privilege separation is required for turning a string of text +into an audio stream, the design follows a ‘decentralised’ pattern +similar to how GStreamer is implemented. + +In order to produce TTS output, an application can link to a TTS +library, which provides functionality for turning a text string into an +audio stream. It then outputs this audio stream as it would any other, +sending it to the audio manager, along with some metadata including an +unforgeable identifier for the application, and potentially other +metadata hints for debugging purposes. The audio manager applies the +same priority policy which it applies to all audio streams, and +determines whether to allow the stream to be played, pause it while +another stream is played then resume it, or cancel it entirely. This is +done using standard audio manager mechanisms using PulseAudio. + +The TTS library receives feedback about the state of the audio channel, +and passes this back to the application in the form of signals, which +the application may use to update its UI, or implement its own policy +for enqueuing or cancelling requests (or it may ignore the signals). + +### Alternative centralised design + +The other major option is for a centralised design, where all TTS +requests are sent to a TTS service (running as a separate process), +which decides on relative priorities for them, converts them from text +to audio, and forwards them to the audio manager. + +There is no need for this design: there is no need for the additional +privilege separation, and it complicates the application of audio +policy, since it now has to be applied in the TTS service *and* the +audio manager. + +### Use of speech-dispatcher + +[][Speech dispatcher] is an existing FOSS system which is the +standard choice for systems like this. However, it is based around a +centralised design which does not fit with our suggested architecture — +a large part of speech-dispatcher is concerned with implementing a +central daemon which handles connections and requests from multiple +clients, prioritises them, then outputs them to the audio manager. As +described in [][Overall architecture] and [][Alternative centralised design], +this is functionality which our recommended design does not need. + +Additionally, speech-dispatcher has the disadvantages that it: + + - does not enforce separation between clients, meaning they may + control each others' output; and + - provides a C API which is not GLib-based, so would be hard to + introspect and expose in other languages (such as JavaScript). + +For these reasons, and due to its centralised architecture, we recommend +*not* using speech-dispatcher. However, it may be possible and useful to +extract relevant parts of its code and turn them into shared libraries +to be used in the Apertis TTS library. The rest of this document will +cover the design with no reference to speech-dispatcher, in the +knowledge that it might substitute for some of the implementation work +where possible. + +### TTS library + +The TTS library would be a new shared library which can be linked into +applications to essentially provide the functionality of turning a text +string into an audio stream. It would provide the following major APIs: + + - Say a text string. + - Stop, pause and resume speech. + - Signal on starting, pausing, resuming and ending audio output, plus + on significant progress through output. + - Set the language for a request. + - ‘Sound icon’ API for associating audio files with specific strings. + +The stop, pause and resume APIs would operate on specific requests, +rather than all pending requests from the application. This allows for +an application to cancel one TTS output while continuing to say another; +or to cancel one output while another is paused. The API should be +implemented as a queue-based one, where the application enqueues a +string to be read, and receives a handle identifying it in the queue. +The TTS library can prioritise requests within the queue, and hence +requests may not be processed for some time after being enqueued. +Signals convey this information to the application. + +The progress signal should be emitted at the discretion of the TTS +library, to signal significant progress to the application in outputting +the TTS request. For example, it could be emitted once per sentence, or +once per word, or not at all. It returns an offset (in Unicode +characters) from the start of the input text. + +The library’s audio output would provided in a format suitable for +passing directly to PulseAudio, or into GStreamer for further +processing. + +The TTS library would implement loading of a TTS backend into the +process, and would load and apply the system settings for TTS output. + +### Installable and swappable backends + +The TTS library would implement voice backends as dynamically loaded +shared libraries, all installed into a common directory. It must monitor +this directory at runtime to detect newly installed voice backends; for +an application bundle to install a new backend, it would have to install +or symlink the library into this directory. + +The TTS library should not care how a voice backend is implemented +internally, as long as it implements a standard backend interface. It +may be possible, for example, to re-use a lot of the code from +speech-dispatcher’s [backend modules][speechd-modules]. + +Each voice backend must provide an interface for converting text to +audio, and returning that audio to the TTS library — it should *not* +implement outputting the audio to the audio manager itself. Backends +must provide a way of enumerating and configuring their voice options +(such as volume, pitch, accent, etc.), including a way of specifying +that an option is read-only or unsupported. It is not expected that all +backends will support all functionality of the TTS library. + +The backend interface must be tolerant of latency in the backends, in +order to support backends which are implemented in the automotive +domain. This means that all functions must be [asynchronous][GAsyncResult]. + +### SDK default backend + +We recommend [Pico] as the default backend to ship with the SDK. It +is freely licenced, and supports 37 languages including non-Latin +languages. It is used on Android, so is relatively stable and mature. + +### Global configuration + +Configuration options for the voice backends should be stored in +GSettings (See [Preferences and Persistence](preferences-and-persistence.md)), +and should be stored once (not per-backend). The +semantics of each configuration option must be rigorously defined, as +each backend must convert those options to fit its own configuration +interface. If a backend has more options in its configuration interface +than are provided by the global TTS library configuration, it must use +sensible, unconfigurable, defaults for the other options. + +Configuration options may include: + + - Voice to use + - Whether to vocalise punctuation + - Voice type (male or female) + - Speech rate + - Pitch + - Volume + +By storing the options in GSettings, it becomes possible to apply +AppArmor policy to control access to them so that, for example, +applications which use the TTS library are only allowed to read the +settings, and only the system preferences application is allowed to +modify them. + +### Per-request configuration + +Configuration which is exposed to applications via the TTS API could be: + + - Pitch + - Speech rate + - Volume + +These options must be exposed purely as *modifiers* on the system-wide +values. These modifiers could be defined symbolically, for example as a +set of three volume modifiers: + + - Emphasised (120% of system-wide volume) + - Normal (100% of system-wide volume) + - De-emphasised (80% of system-wide volume) + +A non-symbolic numerical modifier might be introduced in future. + +The audio manager is responsible for limiting the maximum volume of any +audio stream, to avoid a malicious or faulty application from setting +the volume too high as to distract the driver. + +### Sound icons + +Sound icons are a feature provided by speech-dispatcher, which we could +use as the basis for our own implementation, as this would allow re-use +of the relevant features in voice backends. + +Sound icons could be used for identifying punctuation, for example, or +for clarifying the pronunciation of certain words. It’s suggested that +applications install sound icons at install time, in a per-application +directory which the application points the TTS library at to look up +when asked to play a sound icon. Each sound icon should have an +associated language (or explicitly no associated language), so that the +correct sound icon file can be loaded according to a TTS request’s +language. + +Sound icons should be playable via a TTS library API, similarly to how +text output is requested. They should be provided in WAV format, as this +is what the existing speech-dispatcher backends expect. + +### Request prioritisation + +There are two dimensions to prioritisation of requests: within a single +application, and across multiple applications. + +Requests from within a single application should be handled using a +request queue within the TTS library. This allows squashing similar +requests, or bumping other requests so they are played before other +requests from the same application. + +It is suggested that the speech-dispatcher [priorities][speechd-priorities] +are used for +within a single application, including their semantics. For example, the +TTS library request queue would squash multiple progress requests so +that only one is played at once. + +These priorities should be attached to audio output when it is sent to +the audio manager, as a hint to assist it in its policy decisions. + +Requests from multiple applications are prioritised by the audio +manager, which uses the audio priority of each application (whether it +is an entertainment or interrupt source, and its numerical audio +priority) from the application’s manifest to determine which +requests to play, which to pause then resume, and which to cancel +entirely. The application’s audio priority is under the control of the +OEM, rather than the application developer, so application developers +cannot use this to always output audio at an inflated priority and deny +other applications audio output. + +> See the Audio Management design + +There is one situation where an application with a low priority may need +to output a TTS request at a higher overall priority than an application +with a high priority: when emitting a pop-up notification via the +notification service. This should be handled by having notifications +submitted as TTS requests by the notification service itself, rather +than by the application which produced the notification. This allows the +audio manager to use the notification service’s priority for policy +decisions, rather than the original application’s priority. + +### PulseAudio output + +Output from the TTS library should be sent to PulseAudio in order to be +mixed with other TTS and non-TTS audio streams and sent to the hardware +for output. It is PulseAudio and the audio manager which implement the +priority policies described above. + +In order to differentiate TTS output from different applications, +appropriate metadata should be attached to the audio stream to identify +the application, its internal priority for the TTS request, and the fact +that the audio is a TTS request (as opposed to other audio content). The +application identifier must be unforgeable (i.e. it must come from a +trusted source, like the kernel or D-Bus daemon), as it is used as the +basis for policy decisions. The internal priority and TTS request flag +are entirely under the control of the application (i.e. forgeable), and +therefore must only be used as hints by the audio manager. Additional +unforgeable metadata may come from the application’s manifest file, +which is not under the control of the application developer, and can be +uniquely looked up by the application’s trusted identifier. + +The audio manager most likely will *not* use forgeable metadata from the +application, but this data could be useful for identifying audio streams +when debugging, for example. + +If an application wishes to submit multiple TTS requests simultaneously, +and have the audio manager mix them or decide which one to prioritise, +it must have multiple connections to PulseAudio. + +If, as a result of applying the priority policy, the audio manager corks +an application’s TTS output stream, the TTS library must pause the +corresponding TTS request and notify the application using a signal. +Once the request is uncorked, the TTS library must unpause the request +and notify the application again — unless the application has cancelled +the request in the meantime, in which case the request is already +cancelled and removed. + +The same is true if the audio manager *cancels* an application’s TTS +output stream: the TTS library must cancel the corresponding TTS request +and notify the application using a signal. + +Note that the audio manager’s pausing and resuming of TTS requests is +separate from the pause and resume APIs available to the application. +The application cannot call its resume method to resume a TTS request +which the audio manager has paused. Similarly, the audio manager cannot +call its resume method to resume a TTS request which the application has +paused. This can be thought of as separately pausing or resuming both +ends of the audio channel between an application and the audio manager. + +### Testability + +Testing the TTS system can be split into three major areas: checking +that the TTS library and its various voice backends work; checking that +the audio manager correctly applies its priority policies to incoming +TTS audio streams and normal audio streams; and integration testing of +audio output from an application calling a TTS API. + +The former can be achieved using unit tests within the TTS library +project, which test various components of the library in isolation. For +example, they could compare TTS audio output streams against stored +‘golden’ expected output sound files. + +The audio manager testing should be implemented as part of the audio +manager’s test plan, ensuring that TTS audio channel metadata is +included in a variety of test situations. + +> This should be described in the Audio Management design. + +Finally, the integration testing requires the audio output to be +checked, so is infeasible to implement as an automated test, and would +have to be a manual test where the human tester verifies that the output +sounds as expected for a given set of input situations (requests from a +test client). + +### Security + +The security properties being implemented by the system are: + + - Applications should be independent, in that one application cannot + change the TTS settings for another application, or affect another + application’s TTS output other than through prioritisation of + requests as controlled by the audio manager. + + - Applications must not be able to play a TTS request if the audio + manager has disallowed or paused it (availability of audio output to + other applications). + + - Applications should not be able to set the system-wide TTS + preferences. + + - Applications should not be able to determine the content of other + applications’ TTS requests (confidentiality of requests). + + - Applications must only be allowed to use the TTS system if they have + permission to output audio. + +These are implemented through the separation of audio priority policies +from the TTS library, by implementing them in the audio manager. The +audio manager has a non-forgeable identifier for the application which +originated each TTS audio stream, and the forgeable priority hints which +come from the application are not allowed to override the application’s +audio priority. + +Audio output from an application is subject to that application having +permission to output audio, which is enforced by the audio manager. + +Independence and confidentiality of application audio channels is +implemented as for all audio channels, by having separate connections +from each application to the audio manager. + +Integrity of system-wide TTS preferences is implemented by the AppArmor +policy controlling access to those preferences in GSettings. + +#### Loadable voice backends + +The TTS library, and hence each application which links to it, needs +read-only and execute access to the loadable voice backend libraries, +plus any resources needed by those voices. It also needs read-only +access to the TTS system-wide preferences in GSettings. + +### Suggested roadmap + +There are few opportunities for splitting this system up into stages. +The TTS library needs to be written first, including its loadable voice +backend interface and the first voice backend. More complex features +like sound icons could be ignored in the first version of the library. +With this working, applications could start to use the TTS APIs. The +unit tests and integration tests for the TTS library should be written +from the very beginning. + +With TTS output working, a second stage could implement the priority +policies in the audio manager, and ensure those are working. The system +preferences could also be integrated at this stage. + +A third stage could produce more voice backends (if needed), potentially +including a voice backend which is implemented in the automotive domain, +to ensure that asynchronous calls to the backends work. + +It is worth highlighting that aside from initially ignoring features +like sound icons, there is little scope for simplifying the TTS API for +its first implementation. Specifically, we feel it would be a mistake to +implement a non-queue-based API for scheduling TTS requests to begin +with, and then ‘expand’ it into a queue-based API later on. To do so +would expose applications to a lot of semantic changes in the API which +they would then have to adapt to use. The TTS library API should be +implemented as a queue-based one from the start. + +### Requirements + + - [][Basic TTS API]: Implemented as a C API on the TTS library. + + - [][Progress signalling API]: Implemented using GObject signals + emitted by the TTS library. + + - [][Output policy decided by audio manager]: Implemented by passing + priority and application identifiers to the audio manager, and it + corking, uncorking, or cancelling audio streams according to its + policy, using standard PulseAudio functionality. + + - [][Output streams are mixable]: Audio manager may choose to *not* + cork two streams, and mix them instead. + + - [][Runtime-swappable voice backends]: TTS library loads backends + from a directory as dynamically loaded libraries, and monitors that + directory for changes. + + - [][Installable voice backends]: Installed or symlinked into the + backend library directory. + + - [][Default SDK voice backend]: Pico to be shipped as the default + backend for the SDK. + + - [][Voice backends are not latency sensitive]: Voice backend + interface uses asynchronous functions to avoid blocking the TTS + library. + + - [][System-wide voice configuration]: Stored in GSettings and read + by the TTS library in each application which uses it. The system + preferences application can modify the settings in GSettings. + + - [][Pronunciation data for non-phonetic words]: Provided by an API + in the TTS library similar to the speech-dispatcher API for ‘sound + icons’. + + - [][Per-request language support]: Provided as a per-request API to + hint at the language the source text is written in. + + - [][Support for concurrent requests]: Implemented by allowing + multiple audio channel connections to the audio manager, which + prioritises between them. + + - [][Prioritisation for concurrent requests]: Implemented by + allowing multiple audio channel connections to the audio manager, + which prioritises between them. In-application priorities are + handled by a per-application request queue within the TTS library. + + - [][Permission for using TTS system]: Checked by the audio manager + for each application which attempts to play audio (including TTS + output), using permissions from the application’s manifest. + +## Summary of recommendations + +As discussed in the above sections, we recommend: + + - Implementing a new TTS library, using an API like the one suggested + in [][Appendix: A suggested TTS API]. Parts of speech-dispatcher may be used to aid the + implementation if appropriate. + + - Implementing voice backends as dynamically loaded libraries, + potentially reusing much of the existing backends from + speech-dispatcher. + + - Modifying the audio manager to support applying a priority policy to + TTS requests, using the application’s audio priority, and + potentially logging TTS-specific metadata for debugging purposes. + + - Implementing unit and integration tests for the TTS library, audio + manager and TTS system as a whole. + + - Packaging and using Pico as the default voice backend in the SDK. + + - Modifying the Apertis software installer to generate AppArmor rules + to allow access to the TTS voice backends and their resources, plus + the TTS system settings, if an application is allowed to output + audio. + +## Appendix A: Suggested TTS API + +The code listing is given in pseudo-code. + +--- +/* TTS context to contain relevant state and loaded resources and + * settings. */ +class TtsContext { + async TtsRequest send_request (const string text_to_say, + TtsPriority priority=TEXT, + const string language=null, + TtsVoiceRate voice_rate=TtsVoiceRate.NORMAL, + TtsVolume volume=TtsVolume.NORMAL, + TtsPitch pitch=TtsPitch.NORMAL); + + async TtsRequest send_sound_icon_request (const string icon_name, + TtsPriority priority=TEXT, + const string language=null, + TtsVoiceRate voice_rate=TtsVoiceRate.NORMAL, + TtsVolume volume=TtsVolume.NORMAL, + TtsPitch pitch=TtsPitch.NORMAL); +} + +/* This represents a single pending TTS request. The object may persist + * after the underlying request has been handled, until the application + * programmer unrefs the object. */ +class TtsRequest { + async void pause (); + async void resume (); + async void cancel (); + + /* The current state of the request. */ + property TtsRequestState state; + + /* The current progress of reading through the request, as an offset + * into the original text in Unicode characters. */ + property unsigned int current_offset; + + /* In a GLib API, these would be GObject::notify::state and + * GObject::notify::current_offset. */ + signal notify_state (TtsRequestState state); + signal notify_current_offset (unsigned int current_offset); +} + +enum TtsRequestState { + PREROLL, + PLAYING, + PAUSED, + FINISHED, + CANCELLED, +} + +enum TtsPriority { + IMPORTANT, + MESSAGE, + TEXT, + NOTIFICATION, + PROGRESS, +} + +enum TtsVoiceRate { + SLOW, + NORMAL, + FAST, +} + +enum TtsVolume { + DEEMPHASIZED, + NORMAL, + EMPHASIZED, +} + +enum TtsPitch +{ + LOW, + NORMAL, + HIGH, +} +--- + +[Android-tts]: http://developer.android.com/reference/android/speech/tts/package-summary.html + +[iOS-tts]: https://developer.apple.com/library/ios/documentation/AVFoundation/Reference/AVSpeechSynthesizer_Ref/index.html + +[speech-dispatcher]: http://devel.freebsoft.org/speechd + +[Speech Synthesis Interface Protocol]: http://devel.freebsoft.org/doc/speechd/ssip.html + +[pico-tracker]: https://bugs.debian.org/cgi-bin/pkgreport.cgi?pkg=libttspico0;dist=unstable + +[speechd-modules]: http://git.freebsoft.org/?p=speechd.git;a=tree;f=src/modules;hb=HEAD + +[GAsyncResult]: https://developer.gnome.org/gio/stable/GAsyncResult.html + +[Pico]: https://android.googlesource.com/platform/external/svox/ + +[speechd-priorities]: http://devel.freebsoft.org/doc/speechd/ssip.html#Priority-Categories diff --git a/content/designs/ui-customisation.md b/content/designs/ui-customisation.md new file mode 100644 index 0000000000000000000000000000000000000000..246784f9aa8e14b33316a28756473ffb278254b4 --- /dev/null +++ b/content/designs/ui-customisation.md @@ -0,0 +1,1392 @@ +--- +title: UI customisation +short-description: Abstracting the differences between variants into a UI library + (proof-of-concept) +authors: + - name: Jonny Lamb +--- + +# UI customisation + +## Introduction + +The goal of this user interface customisation design document is to +reduce app development time when porting between variants by abstracting +the differences between variants into a UI library. + +For example, below are designs of an audio player application — on the +left is variant A and on the right is the variant B. + +  + +The A variant mixes three independently scrollable lists for artist, +album, and then track. The B variant uses one scrollable list with +columns for the aforementioned details. Using different widgets and +styling it should be possible to radically change the user interface as +above. More examples of variant changes are shown in [][Variant differences]. + +The goal of standardising this process is reduce the amount of code +written or changed in customising a variant. It is understood that for +system components, code might have to be altered for some requests, but +code inside application bundles should remain as similar as possible and +work in variant-specific ways automatically. + +## Terminology and Concepts + +### Vehicle + +For the purposes of this document, a *vehicle* may be a car, car +trailer, motorbike, bus, truck tractor, truck trailer, agricultural +tractor, or agricultural trailer, amongst other things. + +### System + +The *system* is the infotainment computer in its entirety in place +inside the vehicle. + +### User + +The *user* is the person using the system, be it the driver of the +vehicle or a passenger in the vehicle. + +### Widget + +A *widget* is a reusable part of the user interface which can be changed +depending on location and function. + +### User Interface + +The *user interface* is the group of all widgets in place in a certain +layout to represent a specific use-case. + +### Roller + +The *roller* is a list widget named after a cylinder which revolves +around its central horizontal axis. As a result of being a cylinder it +has no specific start and finish and appears endless. + +### Speller + +The *speller* is a widget for text input. + +### Application Author + +The *application author* is the developer tasked with writing an +application using the widgets described in this document. They cannot +modify the variant or the user interface library. + +### Variant + +A *variant* is a customised version of the system by a particular system +integrator. Usually variants are personalised with particular colour +schemes and logos and potentially different widget behaviour. + +### View + +A *view* is an page in an application with an independent purpose. Views +move from one to another, and sometimes also back, to form the workflow +of the application. For example, in a photo application the list of +photos is one view and the highlight on one photo in particular, perhaps +with more metadata from the photo, is another view. + +### Template + +A *template* is a text-based representation of a set of widgets in a +view. Templates are for allowing changes and extensions without having +to rebuild the actual code. + +### UI prototyping + +*UI prototyping* is the process of building a mock-up of a UI to evaluate how it +looks, and how usable it is for different use cases — but without hooking up +the UI to an application implementation or backing code. The idea is to be able +to produce a representative UI as fast as possible, so designers and testers +can evaluate its usability, and can produce further iterations of the design, +without wasting time on implementing backing functionality in code until the +design is finalised. At this point, a programmer can turn the prototype into a +complete implementation in code. + +The process of prototyping is not relevant to UI *customisation*, but is +relevant to the process of using a UI toolkit. + +Here is an example of some [prototype UIs], made in Inkscape. + +### WYSIWYG UI editing + +*WYSIWYG UI editing* is the process of using a UI editor, such as [Glade], +where the UI elements can be composed visually and interactively to build the +UI, for example by dragging and dropping them together. The appearance of the +UI in the designer is almost identical to its appearance when it is run in +production. + +This could be contrasted with designing a UI by writing a [ClutterScript] file, +for example, where the UI has to be run as part of a program in order to +visualise it. + +## Use Cases + +A variety of use cases for UI customisation are given below. + +### Multiple Variants + +Each system integrator wants to use the same user interface without +having to rewrite from scratch (see [][Variant differences]). + +For example, in the speller, variant A wants to highlight the key on an +on-screen-keyboard such that the key pops out of the keyboard, whereas +variant B wants to highlight just the letter within the key with no pop +out animation. + +Another example, in the app launcher, variant A wants to use a cylinder +animation for rolling whereas variant B wants to scroll the list of +applications like a flat list. + +#### Fixed Variants + +A system integrator wants multiple variants to be installable +concurrently on the system, but wants the variant in use to be fixed and +not able to change after being set in a configuration option. The system +integrator wants said configuration option to be changeable without +rebuilding. + +### Templates + +A system integrator wants to customise the user interface as easily as +possible without recompilation of applications. The system integrator +wants to be able to choose the widgets in use in a particular +application user interface (from a list of available widgets) and have +them work accordingly. + +For example, in a photo viewing application with one photo selected, +system integrator A might want to display the selected photo with +nothing else displayed, while system integrator B might want to display +the selected photo in the centre of the display, but also have the next +and previous photos slightly visible at the sides. + +#### Template Extension + +A system integrator wants to use the majority of an Apertis-provided +template, but also wants to add their own variant-specific extensions. +The system integrator wants to achieve this without copy and pasting +Apertis-provided templates to retain maintainability, and wants to add +their own extension template which merely references the +Apertis-provided one. + +For example, said system integrator wants to use an Apertis-provided +button widget, but wants to make it spin 360° when clicked. They want to +just override the library widget, adding the spin code, and not have to +touch any other code relating to the internal working of the widget +already provided in the library. + +#### Custom Widget Usage + +A system integrator wants to implement custom widgets by writing actual +code. The system integrator wants to be integrate the new custom widgets +into the user interface and into the developer tooling. + +#### Template Library + +A system integrator wants to be able to add new templates to the system +via over the air (OTA) updates. The system integrator does not want the +template to be able to reload automatically after being updated. + +### Appearance Customisation + +Each system integrator wants to customise the look and feel of +applications by changing styling such as padding widths, border widths, +colours, logos, and gradients. The system integrator wants to make said +modifications with the minimum of modifications, especially to the +source code. + +### Different Icon Themes + +Each system integrator wants to be able to trivially change the icon +theme in use across the user interface not only without recompilation, +but also at runtime. + +### Different Fonts + +Each system integrator wants to be able to trivially change the font in +use across the user interface, and bundle new fonts in with variants. + +#### OTA Updates + +System integrators want to be able to add fonts using over the air (OTA) +updates. For example, the system integrator wants to change the font in +use across the user interface of the variant. They send the updated +theme definition as well as the new font file via an update and want it +to be registered automatically and be immediately useable. + +### Language + +The user wants to change the language of the controls of the system to +their preferred language such that every widget in the UI that contains +text updates accordingly without having to restart the application. + +#### Right-to-Left Scripts + +As above, the user wants to change the language of the controls of the +system, but to a language which is read from right-to-left (Arabic, +Persian, Hebrew, etc.), instead of left-to-right. The user expects the +workflow of the user interface to also change to right-to-left. + +#### OTA Updates + +A system integrator wants to be able to add and improve language support +over over the air (OTA) updates. For example, the system integrator +wants to add a new translation to the system. They send the translation +via an update and want the new language to immediately appear as an +option for the user to select. + +### Animations + +A system integrator wants to customise animations for the system. For +example, they want to be able to change the behaviour of list widgets by +setting the visual response using kinetic scrolling and whether there's +an elastic effect when reaching the end of items. Another example is +they also want to be able to customise the animation used when changing +views in an application. Another example is the how button widgets react +when pressed. + +The system integrator then expects to see the changes apply across the +entire system. + +### Prototyping + +An application author wants to prototype a UI rapidly (see [][UI prototyping]), +using a WYSIWYG UI development tool (see [][WYSIWYG UI editing]) with access +to all the widgets in the library, including custom and vendor-specific widgets. + +### Day & Night Mode + +A user is using the system when dark outside and wants the colour scheme +of the display to change to accommodate for the darkness outside so not +be too bright and dazzle the user. Requiring the user to adapt their +eyes momentarily for the brightness of the system could be dangerous. + +### View Management + +An application author has several views in their application and doesn't +want to have to write a system of managing said views. They want to be +able to add a workflow and leave the view construction, show and hide +animations, and view destruction up to the user interface library. + +### Display Orientation + +A system integrator changes the orientation of the display. They expect +the user interface to adapt and display normally, potentially using a +different layout more suited to the orientation. + +Note that the adaptation is only expected to be implemented if easy and +is not expected to be instantaneous, and a restart of the system is +acceptable. + +### Speed Lock + +Laws require that when the vehicle is moving some features be disabled +or certain behaviour modified. + +#### Geographical Customisation + +Different geographical regions have different laws regarding what +features and behaviours need to be changed, so it must be customisable +(only) by the system integrator when it is decided for which market the +vehicle is destined. + +#### System Enforcement + +Due to restrictions being government laws, system integrators don’t want +to rely on application authors to respect said restrictions, and instead +want the system to enforce them automatically. + +## Non-Use Cases + +A variety of non-use cases for UI customisation are given below. + +### Theming Custom Clutter Widgets + +An application developer wants to write their own widget using the +Clutter library directly. They understand that standard variant theming +will not apply to any custom widget and any integration will have to be +achieved manually. + +Note that although unsupported directly by the Apertis user interface +library, it is possible for application authors to implement this higher +up in the application itself. + +### Multiple Monitors + +A system integrator wants to connect two displays (for example, one via +HDMI and one via LVDS) and show something on each one, for example when +developing on a target board like the i.MX6. They understand this is not +supported by Apertis. + +### DPI Independence + +A system integrator uses a display with a different DPI. They understand +that they should not expect that the user interface changes to display +normally and not too big/small relative to the old DPI. + +### Display Size + +A system integrator changes the resolution of the display. They +understand that they should not expect the user interface to adapt and +display normally, potentially using a different layout more suited to +the new display size. + +### Dynamic Display Resolution Change + +A system integrator wants to be able to change the resolution of the +display or resize the user interface. They understand that a dynamic +change in the user interface is not supported in Apertis. + +## Requirements + +### **Variant set at Compile-Time** + +Multiple variants should be supported on the system but the variant in +use should be decided at application compile-time such that it cannot be +changed later (see [][Fixed variants]). + +### **CSS Styling** + +The basic appearance of the widgets should be stylable using CSS, +changing the look and feel as much as possible with no modifications to +the source code required (see [][Appearance customisation], [][Different icon themes]). + +The changes possible using CSS do not need to be incredibly intrusive +and are limited to the basic core CSS properties. For example, changing +colour scheme (background-color, color), icon theme & logos +(background-image), fonts (font-family, font-size), and spacing (margin, +padding). + +More intrusive changes to the user interface should be achieved using +templates (see [][Templates]) instead of CSS changes. + +For example, a system integrator wants to change the colour of text in +buttons. This should be possible by changing some CSS. + +### Templates + +CSS is appropriate for changing simple visual aspects of the user +interface but does not extend to allow for structural modifications to +applications (see [][CSS styling]). Repositioning widgets or even changing +which widgets are to be used is not possible with CSS and should be +achieved using templates (see [][Templates]). + +There are multiple layers of widgets available for use in applications. +Starting from the lowest, simplest, level and moving higher, +encapsulating more with each step: + + - buttons, entries, labels, … + + - buttons with labels, radio buttons with labels, … + + - lists, tree view, … + + - complete views, or *templates*. + +Templates are declarative representations of the layout of the user +interface which are read at runtime by the application. Using templates +it is possible to redesign the layout, look & feel, and controls of the +application without recompilation. + +The purpose of templates is to reduce the effort required by an +application author to configure each widget, and to maintain the same +look and feel across the system. + +#### Catalogue of Templates + +There should be a catalogue of templates provided by the library which +system integrators can use to design their applications (see [][Template library]). +The layouts of applications should be limited to the main use +cases. + +For example, one system integrator could want the music application to +be a simple list of albums to choose from, while another could want the +same information represented in a grid. This simple difference should be +possible by using different templates already provided by the user +interface library. + +#### Template Extension + +In addition to picking layouts from user interface library-provided +templates, it should also be possible to take existing templates and +change them with the minimal of copy & pasting (see [][Template extension]). + +For example, a system integrator could want to change the order of +labels in a track information view. The default order in the +library-provided template could be track name and then artist name, but +said system integrator wants the artist name first, followed by the +track name. This kind of change is too fundamental to do in CSS so a +template modification is required. The system integrator should be able +to take the existing library-provided template and make minimal +modifications and minimal copy & pasting to change the order. + +#### Template Modularity + +Templates should be as modular as possible in order to break up the +parts of a design into smaller parts. This is useful for when changes +are required by a system integrator (see [][Templates], [][Template extension]). +If the entire layout is in one template, it is difficult to make small +changes without having to copy the entire original template. + +Fine-grained modularity which leads to less copy & pasting is optimal +because it makes the template more maintainable, as there's only one +place to change if a bug is discovered in the original library-provided +template. + +#### Custom Widgets in Templates + +A system integrator should be able to use custom widgets they have +written for the particular variant in the template format (see [][Custom widget usage]). +The responsibility of compatibility with the rest of the +user interface of custom widgets is on the widget author. + +#### Documentation + +With a library of widgets and models available to the system integrator, +the options of widgets and ways to interact with them should be well +documented (see [][Template library]). If signals, signal callbacks, and +properties are provided these should all be listed in the documentation +for the system integrator to connect to properly. + +#### Widget Interfaces + +When swapping a widget out for another one in a template it is important +that the API matches so the change will work seamlessly. To ensure this, +widgets should implement core interfaces (button, entry, combobox, etc.) +so that when swapped out, views will continue to work as expected using +the replacement widget. Applications should only use API which is +defined on the interface, not on the widget implementation, if they wish +for their widgets to be swappable for those in another variant. + +As a result, system integrators swapping widgets out for replacements +should check the API documentation to ensure that the interface +implemented by the old widget is also implemented in the new widget. +This will ensure compatibility. + +#### GResources + +If an application is loading a lot of templates from disk there could be +an overhead in the input/output operation in loading them. A way around +this is to use [GResource]s. GResources are useful for storing +arbitrary data, such as templates, either packed together in one file, +or inside the binary as literal strings. It should be noted that if +linked into the binary itself, the binary will have to be rebuilt every +time the template changes. If this is not an option, saving the +templates in an external file using the `glib-compile-resources` binary is +necessary. + +The advantage of linking resources into the binary is that once the +binary is loaded from disk there is no more disk access. The +disadvantage of this is as mentioned before is that rebuilding is +required every time resources change. The advantage of putting resources +into a single file is that they are only required to be mapped in memory +once and then can be shared among other applications. + +### **MVC Separation** + +There should be a functional separation between data provider (*model*), +the way in which it is displayed in the user interface (*view*), and the +widgets for interaction and data manipulation (*controller*) (see +example in [][Templates]). The model should be a separate object not +depending on any visual aspect of the widget. + +Following on from the previous example (in [][Templates]), the model would +be the list of pictures on the system, and the two variants would use +different widgets, but would attach the same model to each widget. This +is the key behind being able to swap one widget for another without +making code changes. + +This separation would push the *model* and *controller* responsibility +to the user interface library, and an application would only depend on +the *model* in that it provides the data to fill said model. + +### Language Support + +All widgets should be linked into a language translation system such +that it is trivial not only for the user to change language (see +[][Language]), but also for new translations to be added and existing +translations updated (see [][Ota updates]). + +### Animations + +Animations in use in widgets should be configurable by the system +integrator (see [][Animations] for examples). These animations should be +used widely across the system to ensure a consistent experience. +Applications should expose a fixed set of transitions which can be +animated so system integrators can tell what can be customised. + +### Scripting Support + +The widgets and templates should be usable from a UI design format, such +as [GtkBuilder] or [ClutterScript]. This includes custom widgets. This would +enable application authors to quickly prototype applications +(see [][Prototyping]). + +### Day & Night Mode + +The user interface should change between light and dark mode when +outside the vehicle becomes dark in order to not shine too brightly and +distract the user (see [][Day night mode]). + +### View Management + +A method of managing application views (see [][View]) should be provided to +application authors (see [][View management]). On startup the application +should provide its views to the view manager. From this point on the +responsibility of constructing views, switching views, and showing view +animations should be that of the view manager. The view manager should +pre-empt the construction of views, but also be sensitive to memory +usage so not load all views simultaneously. + +### Speed Lock + +Some features and certain behaviour in the user interface should be +disabled or modified respectively when the vehicle is moving (see [][Speed lock]). +It should be possible to customise whether each item listed below +is disabled or not as it can depend on the target market of the vehicle +(see [][Geographical customisation]). Additionally, it should be up to the +system to enforce the disabling of the following features and should not +be left completely up to application authors (see [][System enforcement]). + +#### Scrolling Lists + +The behaviour of gestures in scrolling lists should be altered to remove +fast movements with many screen updates. Although still retaining +similar functionality, gestures should cause far fewer visual changes. +For example, swiping up would no longer start a kinetic scroll, but +would move the page up one tabulation. + +#### Text + +Text displayed should either be masked or altered to remove the +distraction of reading it while operating the vehicle, depending on the +nature of the text. + + - SMS messages and emails can have dynamic content so they should be + hidden or masked. + + - Help text or dialog messages should have alternate, shorter messages + to be shown when the speed lock is active. + +#### List Columns + +Lists with columns should limit the number of columns visible to ensure +superfluous information is not distracting. For example, in a contact +list, instead of showing both name and telephone number, the list could +should show only the name. + +#### Keyboard + +The keyboard should be visibly disabled and not usable. + +Additionally, default values should be available so that operations can +succeed without the use of a keyboard. For example when adding a +bookmark when the vehicle is stationary the user will be able to choose +a name for the new bookmark before saving it. When the vehicle is moving +the bookmark will be automatically saved under a default name without +the user being prompted for the name. The name (and other use cases of +default values) should be modifiable later. + +#### Pictures + +Superfluous pictures used in applications as visual aids which could be +distracting should be hidden. For example, in the music application, +album covers should be hidden from the user. + +#### Video Playback + +Video playback must either be paused or the video masked (while the +audio continues to sound). + +#### Map Gestures + +As with kinetic scrolling in lists (see [][Scrolling lists]), the gestures +in the map widget should make fewer visual changes and reduce the number +of distractions for the user. Similar to the kinetic scroll example, the +map view should move by a fixed distance instead of following the user’s +input. + +#### Web View + +Any web view should be masked and not showing any content. + +#### Insensitive Widgets + +When aforementioned functionality is disabled by the speed lock, it +should be made clear to the user what has been modified and why. + +## Approach + +### Templates + +The goal of templates is to allow an application developer to change the +user interface of their application without having to changing the +source code. These are merely templates and have no way of implementing +logic (if/else statements). If this is required, widget code +customisation is required (see [][Custom widgets]). + +#### Specification in ClutterScript + +ClutterScript is a method for creating user interfaces from JSON files. +An example is shown below which describes variant A application chooser +user interface: + +--- +[{ + "id": "model-categories", + "type": "LightwoodAppCategoryModel" +}, +{ + "id": "model-apps", + "type": "LightwoodAppModel" +}, + +{ + "id": "window", + "type": "LightwoodWindow", + "children": [ + { + "id": "roller-categories", + "type": "LightwoodRoller", + "model": "model-categories", + "app-list": "roller-apps", + "signals": [ + { "name": "activated", "handler": "category_activated_cb" } + ] + }, + { + "id": "roller-apps", + "type": "LightwoodRoller", + "model": "model-apps", + "signals": [ + { "name": "activated", "handler": "app_activated_cb" } + ] + } +--- +}] +--- + +The first two objects created (`model-categories` and `model-apps`) are +models for the application categories available on the system, and the +applications available on the system—due to their class names +(`LightwoodAppCategoryModel` and `LightwoodAppModel` respectively). These +models are not widgets visible in the user interface, but proper widgets +will refer to them later in the template. + +The next entry describes the main window in the user interface, inside +of which there is a horizontal box (with some style properties set), +with two children that are both of type `LightwoodRoller`. Although these +widgets are of the same type, they are different instances and they have +been given different models. The first roller widget (on the left) has +been given the `model-categories` model and the second roller widget (on +the right) has been given the `model-apps` model, both created at the +beginning of the JSON file. + +Additionally the `LightwoodRoller::activated` signal is connected on +both rollers to different callbacks. The signal callback names are +listed in the application documentation. In this case, when the +left-hand roller with categories is changed (activated), the right-hand +roller with applications is updated (set by the `app-list` property). + +Another example to compare is given below with the B-variant application +chooser user interface: + +--- +[{ + "id": "model-apps", + "type": "LightwoodAppModel" +}, + +{ + "id": "window", + "type": "LightwoodWindow", + "children": [ + { + "id": "list-apps", + "type": "LightwoodList", + "model": "model-apps", + "signals": [ + { "name": "activated", "handler": "app_activated_cb" } + ] + } +--- +}] +--- + +The differences of the B-variant application chooser in comparison to +the A-variant application chooser are: + +1. There is no categories model and no categories roller. + +2. There is no more box inside the main window widget. + +3. The list widget is a `LightwoodList` instead of a `LightwoodRoller`. + This is a visual difference dictated by the widget implementation + and chosen for this variant, but the data backend for both lists (in + the model `model-apps`) is unchanged. Both widgets should implement a + common `LightwoodCollection` interface. + +These are just two examples of how an application chooser could be +designed. The user interface files contain minimal theming as that is +achieved in separate CSS files (see [][Theming]). + +Typically, applications will come with many templates for system +integrators to either use, or take guidance from. + +#### Properties, Signals, and Callbacks + +The GObject properties that can be set, the signals that can be +connected to, and the signal callbacks that can be used, should be +listed clearly in the application documentation. This way, system +integrators can customise the look and feel of the application using +already-written tools. + +When changing a template to use a different widget it might be necessary +to change the signal callbacks. This largely depends on the nature of +the change of widget but signals names and signatures should be as +consistent as possible across widgets to enable changing as easily as +possible. If custom callbacks are used in the code of an application, +and the callback signature changes, recompilation will be necessary. The +signals emitted by widgets and their type signatures are defined in +their interfaces, documented in the API documentation. + +For example, in the examples above, a `LightwoodRoller` widget for listing +applications was changed to a `LightwoodList` and the activated signal +remained connected to the same `app_activated_cb` callback. + +#### Template Inheritance + +At the time of writing, ClutterScript has no way of referring to objects +from other JSON files or of making an object a modified version of +another. A proposal to modify `ClutterScriptParser` to support this +feature is as follows (this change would take a couple of days to +implement): + +An `app-switcher.json` file contains the following objects defined: + +--- +[{ + "id": "view-header", + "type": "LightwoodHeader", + "height": 100, + "width": 200 +}, + +{ + "id": "view-footer", + "type": "LightwoodFooter", + "height": 80 +}, + +{ + "id": "app-switcher", + "type": "LightwoodWindow", + "color": "#ff0000", + "children": [ "view-header", … , "view-footer" ] +}] +--- + +Header and footer objects are defined (`view-header` and `view-footer`) each +with `height` properties (100 and 80 respectively). An `app-switcher` object +is also created with the `color` and `children` properties set. Note that +the `children` property is a list referring to the header and footer +objects created before. (The ellipsis between said objects marks the omission of +other objects between header and footer for brevity.) + +If a system integrator wanted to give the same appearance to their app +switcher view but wanted to change the height of the header and the +colour of the main app switcher, without copy & pasting a lot of text to +redefine all these objects, objects can simply extend on previous +definitions. For example, in a `my-app-switcher.json`: + +--- +[{ + "external-uri": "file:///path/to/app-switcher.json", + "id": "view-header", + "height": 120 +}, + +{ + "external-uri": "file:///path/to/app-switcher.json", + "id": "app-switcher", + "color": "#ffff00" +}] +--- + +Referencing objects defined in other files can be achieved by specifying +the `external-uri` property pointing to the other file, and the `id` +property for selecting the external object. + +In this example, the `view-header `object is extended and the `height` +property is set to 120. All other properties on the original object +remain untouched. For example, the `width` property of the header remains +at 200. + +It is possible to simply to refer to objects in other files without any +changes. Each external object must be referred to separately as they are +not automatically brought into scope after the first `external-uri` +reference. For example: + +--- +{ + "id": "example-with-children", +--- + "children": [ + "first-child", + { + "id": "second-child", + "type": "ExampleType" + }, + { + "external-uri": "file:///path/to/another.json", + "id": "third-child" + } +--- +} +--- + +In this example, this object has three children: + +1. An object called `first-child`, defined elsewhere in the JSON file. + +2. An object called `second-child`, defined inline of type `ExampleType`. + +3. An object called `third-child`, defined in `another.json`. + +In these examples, the `external-uri` used the `file://` URI scheme, but +others supported by GIO can be used. For example, templates in +[][GResources] can be used using the `resource://` URI +scheme. + +Application authors should not use templates and inheritance excessively +such that every single object is in a separate file. This will cause +more disk activity and could potentially slow down the application. +Templates should be broken up when clarity is in question or when a +non-trivial object is to be used across in other views. + +#### Widget Factories + +If a system integrator wants to replace a widget everywhere across the +user interface, they can use a widget factory to replace all instances +of said old widget with the new customised one. This is achieved by +using a factory which overrides the type parameter in a ClutterScript +object. + +For example, if a system integrator wants to stop using `LightwoodButtons` +and instead use the custom `FancyButton` class, there are no changes +required to any template, but an entry is added to the widget factory to +produce a `FancyButton` whenever a `LightwoodButton` is requested. Templates +can continue referring to `LightwoodButton` or can explicitly request a +`FancyButton` but both will be created as `FancyButtons`. If an +application truly needs the older `LightwoodButton`, it needs to create a +subclass of `LightwoodButton` which is not overriden by anything, and then refer to +that explicitly in the template. + +#### Custom Widgets + +Apertis widgets can be subclassed by system integrators in variants and +used by application developers by creating shared libraries linking to +the Apertis widget library. Applications then link to said new library +and once the new widgets are registered with the GObject type system +they can be referred to in ClutterScript user interface files. If a +system integrator wants a radically different widget, they can write +something from scratch, ensuring to implement the appropriate interface. +Subclassing existing widgets is for convenience but not technically +necessary. + +Apertis widgets should be as modularised as possible, splitting +functionality into virtual methods where a system integrator might want +to override it. For example, if a system integrator wants the roller +widget to have a different activation animation depending on the number +of items in the model, they could create a roller widget subclass, and +override the appropriate virtual methods (in this case activate) and +update the animation as appropriate: + +```C +G_DEFINE_TYPE (MyRoller, my_roller, LIGHTWOOD_TYPE_ROLLER) +static void +my_roller_init (MyRoller *self) +{ +} + +static gboolean +my_roller_activate (MyRoller *self, +gint item_id) +{ + LightwoodRoller *roller; + LightwoodRollerClass *roller_class; + LightwoodModel *model; + + roller = LIGHTWOOD_ROLLER (self); + roller_class = LIGHTWOOD_ROLLER_GET_CLASS (roller); + model = lightwood_roller_get_model (roller); + + if (lightwood_model_get_n_items (model) > 5) { + /* change animation */ + } else { + /* reset animation */ +--- + + /* chain up */ + return roller_class->activate (roller, item_id); +} + +static void +my_roller_class_init (MyRollerClass *klass) +{ + LightwoodRollerClass *roller_class = LIGHTWOOD_ROLLER_CLASS (klass); + + roller_class->activate = my_roller_activate; +} +--- + +Another example is if the system integrator wants to change another part +of the roller when scrolling starts, the appropriate signal can be +connected to: + +--- +G_DEFINE_TYPE (MyRoller, my_roller, LIGHTWOOD_TYPE_ROLLER) + +static void +my_roller_scrolling_started (MyRoller *self, + gpointer user_data) +{ + /* scrolling has started here */ +} + +static void +my_roller_constructed (GObject *obj) +{ + /* chain up */ + G_OBJECT_GET_CLASS (obj)->constructed (obj); + g_signal_connect (obj, "scrolling-started", G_CALLBACK (my_roller_scrolling_started), NULL); +} + +static void +my_roller_class_init (MyRollerClass *klass) +{ + GObjectClass *object_class = G_OBJECT_CLASS (klass); + object_class->constructed = my_roller_constructed; +} +--- + +In the template, this variant would stop referring to `LightwoodRoller` +and instead would use `MyRoller`, or update the widget factory entry (see +[][Widget factories]). + +### Models + +Data that is to be displayed to the user in list widgets should be +stored in an orthogonal model object. This object should have no +dependency on anything visual (see [][MVC separation]). + +The actual implementation of the model should be of no importance to the +widgets, and only basic model interface methods should be called by any +widget. It is suggested to use the [GListModel] interface as said +model interface as it provides a set of simple type-safe methods to +enumerate, manipulate, and be notified of changes to the model. + +As `GListModel` is only an interface, an implementation of said interface +should be written, ensuring to implement all methods and signals, like +`GListStore`. + +### Theming + +Using the `GtkStyleContext` object from GTK+ is wise for styling widgets +as it can aggregate styling information from many sources, including +CSS. GTK+'s CSS parsing code is advanced and well tested as GTK+ itself +switched its [Adwaita] default theme to pure CSS some time ago, replacing +theme engines that required C code to be written to customise appearance. + +Said parser and aggregator support multiple layers of overrides. This +means that CSS rules can be given priorities and rules are followed in a +specific order (for example theme rules are set, and can be overridden +by variant rules, and can be overridden by application rules, where +necessary). This is ideal for Apertis where themes set defaults and +variants need only make changes where necessary. + +#### Clutter Widgets + +The `GtkApertisStylable` interface is a mixin for any GObject to enable +use of a `GtkStyleContext`. It is an Apertis-specific interface and +therefore not candidate for upstreaming, and will need maintaining as +the CSS machinery in GTK+ changes over time. + +Every existing Clutter widget will have to be manually taught to use the +style context and any special requirements will also need to be applied +from the style context as necessary. + +#### Theme Changes + +Applications should listen to a documented [GSettings] key for +changes to the theme and icon theme. Changes to the theme should update +the style properties in the `GtkStyleContext` and will trigger a widget +redraw and changes to the icon theme should update the icon paths and +trigger icon redraws. + +### Language Support + +GNU [gettext] is a well-known system for managing translations of +applications. It provides tools to scan source code looking for +translatable strings and a library to resolve said strings against +language files which are easily updated without touching the source code +of said applications. + +#### Language Changes + +Applications should listen to a documented GSettings key for changes to +the user-chosen language, then re-translate all strings and redraw. + +#### Updating Languages + +Language files for GNU gettext saved into the appropriate directory can +be easily used immediately with no other changes to the application. +Over the air (OTA) updates can contain updated language files which get +saved to the correct location and would be loaded the next time the +application is started. + +### Day & Night Mode + +Inspired by GTK+'s *[dark mode]*, variant CSS should provide a `dark` +class for widgets to be used in night mode. If the `dark` class is not set the user +interface should be in day mode. [CSS transitions] +should make the animation smooth. + +A central GSettings key should be read to know when the system is in day +or night mode. It will be modifiable for testing and in development. + +### View Management + +At the time of writing, Clutter does not have a built-in view management +system. GTK+ has [GtkStack] for managing views and displaying +animations when moving between one view and another. The useful parts +of `GtkStack` could be migrated to Clutter (subject to suitable licensing) +to re-use the functionality and user testing and not waste effort in +reimplementing everything from scratch. Existing view management systems +(for example, in `libthornbury`) should also be considered for this +migration task. + +### Speed Lock + +There should be a system-operated service that determines when the +vehicle is moving and when it is stationary. From this point the Apertis +widgets and applications should change when and where appropriate. + +There should be a GSettings key which indicates whether the speed lock +is active or not. This key should only be modifiable by said +system-operated service and should be readable by the entire system. + +#### Apertis Widgets + +Applications that use Apertis widgets extensively should have very +little to modify to support the speed lock. Apertis widgets should read +and monitor the GSettings key to change their content when necessary: + + - Lists with kinetic scrolling – disable the kinetic scrolling (see + [][Scrolling lists]). + + - Text – very long texts in text views or label widgets should be + hidden if there is no alternative provided (see [][Text]). + + - Keyboard – do not show the keyboard and provide feedback to the + application as to whether the keyboard could appear or not (see + [][Keyboard]). + + - Pictures – mast the picture shown in the picture widget (see + [][Pictures]). + + - Video playback – either pause the video completely or just mask the + video and keep the audio sounding (see [][Video playback]). + + - Map – disable the kinetic scrolling (see [][Map gestures]). + + - Web view – mask the contents entirely (see [][Web view]). + +Apertis widgets should fill text widgets with contents that can +differ depending on whether the speed lock is active or not. + +#### List Columns + +The number of columns visible should be reduced to remove superfluous +information when the speed lock is active (see [][List columns]). The nature +of every list can be different and the detection of superfluous +information is impossible automatically. There should be a way of either +application authors specifying which columns should be hidden, or it +should be left up to the application itself. If the latter is not an +option (see enforcement comments in [][Speed lock]), the entire list widget +should be masked to hide its contents. + +#### Keyboard + +As mentioned in [][Keyboard], applications +should deal with the possibility that the keyboard may not be available +at any given time, if the speed lock is active. In the case that +the keyboard request is denied, the application should change its user +experience slightly to accommodate for this, such as the example with +bookmarks given previously. + +The change of user experience also means there must be other ways in +which users can edit named items using default values after the speed +lock has been disabled. + +#### Templates + +Apertis-provided templates should have versions for when the speed lock +is activated and Aperis widgets should switch to these templates +accordingly. + +#### Insensitive Widgets + +As highlighted in [][Insensitive widgets], it should be made obvious to the +user when functionality is disabled, and why. There should be a uniform +visual change to widgets when they have been made insensitive so users +can immediately recognise what is happening. + +A documented CSS class should be added to widgets that are made +insensitive by the speed lock so that said widgets follow an identical +change in display. + +#### Notifications + +Pop-up notifications or a status bar message should make it clear to the +user that the speed lock is active and if appropriate, highlight the +current functionality that has been disabled. + +#### Masking Unknown Applications + +Applications can technically implement custom widgets and not respect +the rules of the speed lock. As a result, applications which haven’t +been vetted by an approved authority should not be able to be run when +the speed lock is active. When they are already running and the speed +lock is activated, they should be masked and the user should not be able +to interact with them. + +This behaviour should be customisable and possibly only enabled in a +region in which laws are very strict about speed lock restrictions. + +## References + +### GTK+ Migration + +In an older version of this document it was posed that a move from +Clutter to GTK+ might be wise. It has been decided that for the time +being a move is unwise due to the immature nature of the GTK+ Scene +Graph Kit. + +The following sections were removed from the previous sections in this +document and have been left here for future reference. + +#### GTK+ or Clutter + +The following suggestions are possible using either the GTK+ or Clutter +libraries. Existing code is currently written in Clutter, but a move to +GTK+ could be wise because GTK+ is still highly used and maintained, +whereas Clutter is less used and less maintained. The maintainers of +Clutter have even announced that planned future additions to GTK+ would +[deprecate Clutter]. Although not in stone, the deprecation is +[planned][GTK-roadmap] for the 3.20 release of GTK+ which is planned in +March 2016. + +It is worth noting that Clutter widgets can be embedded inside GTK+ +applications, and GTK+ widgets can be embedded inside Clutter +applications, but there are many problems with input and ensuring GTK+ +functions are only called from GTK+ callbacks, so following this path is +likely not worth the eventual problems. + +For completeness, the following sections with toolkit-specific +approaches are split into two such that both GTK+ and Clutter paths can +be considered. + +#### Specification in GtkBuilder + +GtkBuilder is a method for creating user interfaces from XML files. An +example is shown below which describes the A-variant application chooser +user interface: + +--- +<interface> + <object class="LightwoodAppCategoryModel" id="model_categories" /> + <object class="LightwoodAppModel" id="model_apps" /> + <object class="LightwoodWindow" id="window"> + <child> + <object class="GtkBox" id="hbox1"> + <property name="homogeneous">True</property> + <property name="orientation">GTK_ORIENTATION_HORIZONTAL</property> + <child> + <object class="LightwoodRoller" id="roller_categories"> + <property name="model">model_categories</property> + <property name="app-list">roller_apps</property> + <signal name="activated" handler="category_activated_cb" /> + </object> + </child> + <child> + <object class="LightwoodRoller" id="roller_apps"> + <property name="model">model_apps</property> + <signal name="activated" handler="app_activated_cb" /> + </object> + </child> + </object> + </child> + </object> +</interface> +--- + +The first two objects created (`model_categories` and `model_apps`) are +models for the application categories available on the system, and the +applications available on the system—due to their class names +(`LightwoodAppCategoryModel` and `LightwoodAppModel` respectively). These +models are not widgets visible in the user interface, but proper widgets +will refer to them later in the template. + +The next entry describes the main window in the user interface, inside +of which there is a horizontal box (with some style properties set), +with two children that are both of type `LightwoodRoller`. Although these +widgets are of the same type, they are different instances and they have +been given different models. The first roller widget (on the left) has +been given the `model_categories` model and the second roller widget (on +the right) has been given the `model_apps` model, both created at the +beginning of the XML file. + +Additionally the `LightwoodRoller::activated` signal is connected on +both rollers to different callbacks. The signal callback names are +listed in the application documentation. In this case, when the +left-hand roller with categories is changed (activated), the right-hand +roller with applications is updated (set by the `app-list` property). + +Another example to compare is given below with the B-variant application +chooser user interface: + +--- +<interface> + <object class="LightwoodAppModel" id="model_apps" /> + <object class="LightwoodWindow" id="window"> + <child> + <object class="LightwoodList" id="list_apps"> + <property name="model">model_apps</property> + <signal name="activated" handler="app_activated_cb" /> + </object> + </child> + </object> +</interface> +--- + +The differences of the B-variant application chooser in comparison to +the A-variant application chooser are: + +1. There is no categories model and no categories roller. + +2. There is no more box inside the main window widget. + +3. The list widget is a `LightwoodList` instead of a `LightwoodRoller`. + This is a visual difference dictated by the widget implementation + and chosen for this variant, but the data backend for both lists (in + the model `model_apps`) is unchanged. + +These are just two examples of how an application chooser could be +designed. The user interface files contain minimal theming as that is +achieved in separate CSS files (see [][Theming]). + +Typically, applications will come with many templates for system +integrators to either use, or take guidance from. + +#### GTK+ Widgets + +Support for `GtkStyleContext` inside GTK+ widgets is already present. +Widgets inside the GTK+ library (and therefore also their subclasses) +already talk to the style context and are drawn according to custom +styling. + +New GTK+ widgets with special requirements would need to get the +appropriate style information from the style context and apply it as +necessary. This is documented in the GTK+ documentation and it is easy +to find examples of it in the source code. + +## Appendix + +### Variant Differences + +#### Thumbnail View + +  + + - Re-used: + + - Roller + + - Views drawer + + - Differences: + + - In variant A, one needs to go back to the app launcher to start + the photo viewer with tags/title/date as they are all separate + apps; in B it is in the same app. + +#### Detail View + +  + + - Re-used: nothing + + - Differences: + + - In variant A it is a roller; in B it is an individual image. + + - In variant B there is a media info widget; in A it’s a roller on + the right. + +#### List View + +  + + - Re-used: nothing + + - Differences: + + - The roller is different (it displays different information). + + - In variant B one can delete; in A the feature is not present. + +#### Full Screen + +  + + - Re-used: nothing + + - Differences: + + - The full screen is a roller in variant A; in B it is a single + snapshot. + + - In variant B there are many extra functions; in A these + functions are not present. + +[GResource]: https://developer.gnome.org/gio/stable/GResource.html + +[GtkBuilder]: https://developer.gnome.org/gtk3/stable/GtkBuilder.html#GtkBuilder.description + +[ClutterScript]: https://developer.gnome.org/clutter/stable/ClutterScript.html#ClutterScript.description + +[GListModel]: https://developer.gnome.org/gio/stable/GListModel.html + +[Adwaita]: https://git.gnome.org/browse/gtk+/tree/gtk/theme/Adwaita + +[GSettings]: https://developer.gnome.org/gio/stable/GSettings.html + +[gettext]: https://www.gnu.org/software/gettext/ + +[dark mode]: https://developer.gnome.org/gtk3/stable/GtkSettings.html#GtkSettings--gtk-application-prefer-dark-theme + +[CSS transitions]: http://www.w3schools.com/css/css3_animations.asp + +[GtkStack]: https://developer.gnome.org/gtk3/stable/GtkStack.html + +[deprecate Clutter]: https://www.bassi.io/articles/2014/07/29/guadec-2014-gsk/ + +[GTK-roadmap]: https://wiki.gnome.org/Projects/GTK%2B/Roadmap + +[Prototype UIs]: https://github.com/gnome-design-team/gnome-mockups/blob/master/passwords-and-keys/passwords-and-keys.png + +[Glade]: https://glade.gnome.org/ diff --git a/content/designs/upstreaming.md b/content/designs/upstreaming.md new file mode 100644 index 0000000000000000000000000000000000000000..66702f7e080daf8afd307a6a6caa18a9a9503f9f --- /dev/null +++ b/content/designs/upstreaming.md @@ -0,0 +1,130 @@ +# Upstreaming + +Upstreaming changes made to a piece of Open Source software provides distinct +advantages for the author of the changes and the ecosystem of users of +software. The author can expect to see: + +* Reduced overhead in on-going maintenance of their code base: With the changes + available in the upstream version, the author will no longer need to port + changes when upgrading to a newer version of the software. +* Reduced risk of incompatible changes and/or features being added to the + upstream code base: When making local modifications to a code base, there is a + risk that at some future point any local changes will fail to apply without + significant changes or a comparable feature will be implemented with + different semantics requiring either conversion to this feature to be carried + out or continuing to carry the local change with much reduced advantages. +* Greater review of the proposed changes: Open source projects tend to review + changes before they are merged and whilst not perfect, such reviews tend to + be carried out by developers with a good working knowledge of the code base, + which results in a better review than would be achieved in many settings. +* Potentially increased testing of added features: Other users of the software + either evaluating or using the upstreamed features may uncover bugs or + security holes in the added features that might otherwise go missed. This + results in increased robustness for your users. +* Potential for further complementary features being added: The addition of a + feature may prompt other developers to add complimentary features that prove + useful to either you or your users. + +The upstream project obviously benefits from the addition of features as this +makes the software appealing to a wider audience and thus is likely to increase +adoption. Having an active community may also help to increase adoption in its +own right. + +Whilst there is obviously an advantages for upstream projects to accept new +features, it is also important that they thoroughly review and consider such +changes. Bad changes could lead to instability of the application, negatively +impacting all users of the project. Additionally the maintainers are taking on +the task of providing some maintenance for the added features. It is thus +important that they ensure such changes comply with the coding conventions and +other polices to ensure ease of maintenance and continued good health of the +project. + + +# Where to upstream + +As mentioned elsewhere, Apertis is a derivative of Debian, which it's self +packages software projects from many sources outside of Debian its self. +Depending on the changes being made, this may present 3 options as to where to +upstream the changes: + +- Apertis +- Debian +- Main project of the software component + +The effort required to get changes accepted by each of these places and the +associated delay in seeing your changes reach a released version of Apertis are +going to differ, often quite drastically and is frequently inversely +proportional to the on-going maintenance costs. + +As a user of Apertis, it is likely that submitting changes to Apertis may offer +the lowest barrier to entry and fastest route to see the changes reflected in +an Apertis release. There may be instances where Apertis offers the only real +option, for example where Apertis is maintaining an older version of the +project for licensing reasons. + +It is likely that Debian will only accept very limited types of change. Some +changes such as security and bug fixes may be more viable for upstreaming to +Debian, as a general rule feature additions may be less likely to be accepted, +though this will depend on how "alive" the upstream project is, what the +feature is and how the maintainer feels about it (after all, the maintainer +will be taking on the burden of maintaining the patches). Any patches that are +accepted by Debian may take longer to be picked up by Apertis (depending on +exactly where the changes landed). + +The last option is to submit changes directly to the upstream project. Clearly +packaging related changes can't be submitted here as these are not generally +handled by the upstream project. It is also possible that the version used in +the current Apertis is behind upstream development branch because Apertis +prioritizes stability over new features. However the upstream development +branch is where changes would need to be submitted, even when using this branch +incurs additional effort to port and test. The advantage to submitting to the +upstream project is that the changes will require no further ongoing porting to +newer versions as it will be in the main code base. + +In order to alleviate the significant delay that a user of Apertis is likely to +see between upstreaming to either Debian or upstream projects and the changes +appearing in Apertis, it is very likely that Apertis would be willing to accept +backported upstream changes to the version currently in use. This provides the +user with the advantage of being able to immediately use the functionality +without needing to carry local changes, whilst the Apertis developers can +expect to only need to carry the changes in the short to medium term until the +changes filter through. + + +# What can be upstreamed + +Most systems are comprised of parts that exist to provide a relatively standard +environment that are likely to be (or could be) common to many devices using +similar hardware or requiring similar functionality and parts that provide some +kind of unique experience or custom logic specific to the device in question. +It is the parts that form the standard environment where the advantages of +upstreaming are most commonly exploited as these are the parts most likely to +benefit others and which benefit the most from others usage. + +Examples of items that would be prime candidates for upstreaming include: +- Drivers for publicly available devices, including peripheral devices and + architectures/SoCs previously not supported +- Previously unsupported functionality in provided by supported devices +- Extending functionality for widely usable use cases in user space libraries + and applications +- Bug fixes and security fixes for any upstream component + +Ideally, such additions would be submitted to the main project in the first +instance (with a backport submitted to Apertis once accepted upstream). Should +upstream submission fail such patches would be considered on a case-by-case +basis for addition into Apertis. + +**Note**: Upstreaming is generally a process best considered from the outset. +If upstreaming is planned at an early stage consider actively +[working with the community](contribution-process.md#upstream-early-upstream-often) +during development, as this may streamline and simplify the development and +upstreaming process. + +Modifications and functionality that are not suitable for upstreaming will be +considered on a +[case by case basis](contribution-process.md#adding-components-to-apertis). +Whether they will be considered suitable for integration into the main Apertis +repositories will depend in part on how broad the usefulness of the changes +will be with the Apertis user base. At a minimum it would be necessary for such +changes to comply with the coding standards of the relevant package, not impact +the operation of Apertis for other users and be suitably licensed. diff --git a/content/designs/web-engine.md b/content/designs/web-engine.md new file mode 100644 index 0000000000000000000000000000000000000000..110ba53c92b1baba1d327b6c340e0c606a3016a2 --- /dev/null +++ b/content/designs/web-engine.md @@ -0,0 +1,102 @@ +--- +title: Web engine +short-description: Notes on the web engine provided by Apertis +authors: + - name: Gustavo Noronha Silva +--- + +# Web engine + +## Introduction + +Apertis provides the GTK port of WebKit as its web engine. To +ensure low maintenance effort, no changes are made to the downstream +branch, any improvements should go to the upstream project. + +### Security maintenance + +Like all other major browser engines, the GTK port does not provide long +term support, so security maintenance comes down to staying up to date. + +The general approach Apertis takes is to follow whatever Debian provides. +The project may also importing a new upstream release that has not been +made available in Debian yet if an important fix is available. + +### Customization + +Apertis has made a decision to not perform customizations to the engine +with the goal of keeping maintenance efforts to a minimum. Whenever a +feature is desired, it should be proposed and contributed directly +upstream. + +#### White-listing and black-listing + +There has been interest in maintaining a black-list of web applications +(or pages) that misbehaved. That would be for the case in which the +browser gets killed because it stopped responding and the scripts +watchdog was not able to restore it to working, so that those web apps +or pages are not loaded automatically upon startup causing the browser +to go unresponsive again. + +[Web] \(codename [Epiphany]), the GNOME web browser maintains a +session file that stores information about all loaded pages, such as +title, URL, and whether they are currently loading or not. If Web is +quit unexpectedly, it will refuse to load any pages that were marked as +still loading automatically. This same approach could be used by the +Apertis web browser to not load those pages automatically or to create a +blacklist database. + +The white-list, on the other hand, would be used to enable applications +to use lots of resources for a long time without getting killed by this +infrastructure in what could be considered a false positive. A +white-list can easily be implemented, keeping a list of applications +that are allowed to go over the limits should be enough. + +#### Rendering of non-web documents + +Several kinds of documents that are not strictly web documents are +available on web sites for viewing and download. Some of these types of +documents, such as PDFs, have become so common that some browsers embed +a viewer. + +WebKit itself does not have support for rendering those documents and +the WebView widget provided by WebKitGTK does not support any +kind of custom rendering. Applications and browsers that use the engine +can also embed different widget alongside the WebView if they would like +to allow viewing PDFs and other kinds of documents on the same user +interface. + +## Scheduled and potential future work + +### Web runtime + +There is interest in providing developers with a way to write +applications using web technologies for Apertis. While this is out of +the scope of this design, a small description of existing technologies +and how they can be applied follows. Collabora can help in the future +with a more detailed analysis of what works needs doing, specification +and development of a solution. + +A runner for web applications would ideally create a process for each +separate application that will be executed, and use application-specific +locations for storing data such as caches and the databases for the +features described above – meaning it would not require any kind of +special privilege, it would be a regular application. This also means +permissions and resource limits can be set individually also for web +applications. + +The fact that more than one process would be executed does not mean a +lot of memory overhead, since shared libraries are only loaded in memory +once for all processes that use them. This would also have several +advantages such as making managing applications permissions easier, and +avoiding one application interfering with others. + +Collabora believes that the best way to provide platform APIs to the +web applications would be through custom bindings for relevant interfaces. +That ensures the APIs will feel native to the JavaScript environment, +and will significantly reduce the attack surface presented to pages, +compared to a solution that makes all native libraries available. + +[Web]: https://live.gnome.org/Design/Web + +[Epiphany]: https://live.gnome.org/Epiphany diff --git a/content/designs/web-portal-caching.md b/content/designs/web-portal-caching.md new file mode 100644 index 0000000000000000000000000000000000000000..ae53d934ac7c16311dc480296c06a2fd13bf8c63 --- /dev/null +++ b/content/designs/web-portal-caching.md @@ -0,0 +1,223 @@ +--- +title: Web portal caching +short-description: Analisys of caching strategies for web application portals (general-design) +authors: + - name: Emanuele Aina +--- + +# Web portal caching + +## Introduction + +The purpose of this document is to evaluate the available strategies +to implement a custom, single-purpose browser +restricted to a single portal website that hosts several HTML/JS applications. + +The portal and the visited applications should be available +even if no Internet connection is available. + +If a connection to the Internet is available, +the locally-stored contents should be refreshed. + +Locally-stored copies should be used to speed up loading +even when the connection to the Internet is available. + +The portal and the applications store all their runtime data +using the [`localStorage`][Web Storage API] +or [IndexedDB][Indexed Database API] mechanisms +and how that is synchronized is out of the scope of this document, +which instead focuses on how to manage static assets. + +## How HTTP caching works + +Caching is a very important and complex feature in modern web engines +to improve page load time and reduce bandwidth consumption. +[RFC7234] defines the mechanisms that control caching in the HTTP protocol +regardless of its transport or serialization, +which means that the same mechanisms apply to HTTPS and HTTP2 in the same way. + +HTTP has provisions for several use cases: +* preventing highly dynamic resources from being cached +* letting clients know for how long is acceptable to use cached data +* optimizing validation of cached entries to skip the download of the bodyi + if the copy on the client still matches the one on the server +* informing clients about resources that can be safely used + even if stale when no connection is available + and which ones must return an error + +Caching is generally available only for the `GET` method +and is controlled by the server for every single HTTP resource +by adding the `Cache-control` header to its responses: +this instruct the client (the web engine) on the ways +it can store the retrieved contents +and re-use them to skip the download on subsequent requests. + +One of the most important uses of the `Cache-control` header +is to disable any kind of caching on highly dynamic generated resources, +by specifying the `no-store` value. + +The `public` and `private` directives instruct clients +that the resource can be stored in the local cache +(`public` also allows for caching in intermediate proxy servers, +a feature which is progressively getting obsolete +as it conflicts with the confidentiality requirements of HTTPS/TLS). + +The `Expire` header and the `max-age` directive let the server instruct the client +for how long it can consider the cached resource valid. +The client can completely skip any network access +as long as the cached resource is “freshâ€, +otherwise it has to validate it against the server, +but this does not mean that a complete re-download is always needed: +using conditional requests, that is using the `If-Modified-Since` or `If-None-Match` headers +to pass the values of the `Last-Modified` or `ETag` headers from the previous request, +the dowload of the body is skipped if the values match +and only headers will be transferred with a `304 Not Modified` response. + +The HTML5 specification recently introduced the concept +of [application cache][Offline Web applications] +which caters for an additional, higher-level use case: +pro-actively downloading all the resources needed by an HTML application +for offline usage. + +This works by adding a `manifest` attribute +to the `<html>` element of the main application page, +and from there indicate the URL of a specially formatted resource +that lists all the URLs the client needs to pro-actively retrieve +in order to be able to run the application correctly when offline. +The caching model used by this specification +is somewhat less refined than the one used by the HTTP specification +and for this reason it needs some special attention +on how to ensure that the application is properly refreshed +when changes are made on the server. + +The more complex and powerful [Service Workers] specification is meant to replace this, +but it is not supported yet by all modern browsers +(works in Firefox and Chrome, WebKit and Edge don't support it yet). +The specification has been stable for more than a year, despite not being finalized yet. +The WebKit team has not yet shown a clear interest in implementing it, +which may be the reason why the specification is still in the current status. + +## Caching in WebKit + +WebKit currently has several caches: +* a non-persistent, in-memory cache of rendered pages + which is set to 2 pages if the total RAM is bigger or equal to 512MB +* a non-persistent, in-memory [decoded/parsed object cache](https://trac.webkit.org/browser/trunk/Source/WebKit2/Shared/CacheModel.cpp#L83), + set to 128MB if the total RAM is bigger or equal to 2GB + and progressively lowered as the amount of total RAM decreases +* a persistent, on-disk [resources cache](https://trac.webkit.org/browser/trunk/Source/WebKit2/Shared/CacheModel.cpp#L158) + of 500MB if there are more than 16GB free on the disk, + progressively scaling down to 50MB if less than 1GB is available. + +Those sizes are computed automatically but they can be customized to fit any requirements. + +When a new resource needs to be cached WebKit +makes sure that the upper bound is respected and frees older cache entries in a LRU pattern +to make enough room to accomodate the resource which is about to be downloaded. + +Downloaded contents to be stored in the on-disk URL cache are directly saved in the filesystem, +using the normal buffering that the kernel does +for every application to improve performance and minimize eMMC wear. +This is further minimized by the fact that only contents marked for caching by the server +using the appropriate HTTP headers will be cached: +highly dynamic contents like news tickers won't be marked as cacheable +so they won't impact the eMMC at all. + +The application cache is handled separatedly and it is unlimited by default, +but this is a setting that can be changed. +All the resources are stored in a SQLite database as data blobs, +except for audio and video resources where the only the metadata is stored in the database +and the contents are stored separatedly. + +To use the application cache effectively in WebKitGTK+ +some implementation work would be required to limit the maximum size +as the WebKit core hooks are currently not used by the WebKitGTK+ port, +and the WebKit core itself does not currently provide +any expiration policy for the cached contents. + +## Client/Server implementation strategies + +Multiple strategies can be used to implement the previously defined system +and affect the design of the client and of the contents offered by the portal server. + +### Application cache + +The main HTML page of the portal links to an appcache manifest +that instruct the browser to pro-actively fetch all the needed resources. + +All subsequent accesses to the portal will be served from the cached copy, +regardless of the availability of an Internet connection. + +If the portal is accessed when an Internet connection is available, +the browser will retrieve the appcache manifest from the server in the background +and check for modifications: +if a new version is detected the portal resources will be refreshed in the background +and will be used for subsequent accesses to the portal. + +Each application will have its own appcache manifest, +so it will be locally cached after the first visit. + +To ensure that the portal is available on first-boot +even if no Internet connection is available, +during the process of generating the system image +the browser will be launched using a special mode +that will cause it to connect to the portal, populate the application cache +and exit as soon as the `ApplicationCache::updateready` event is fired. +An ad-hoc program using WebKit may be used +instead of adding a special mode to the browser. + +This is the simplest and most portable approach on the client side, +as all the caching logic is provided by the portal server using standard W3C mechanisms. + +### Custom HTTP application caching server running locally + +Alternatively, the browser can be instructed to connect to a custom HTTP proxy server +running locally instead of directly to the portal server. + +Since TLS authentication cannot work appropriately through proxy servers, +it is taken care by the proxy server itself, +with the browser talking to the local proxy over unencrypted HTTP +and the proxy converting HTTP requests to HTTPS. + +This means that unencrypted communications will only happen locally between trusted components, +while all the network traffic will be encrypted. +Just like for any other HTTP error, +the proxy can return error pages to the browser in case of TLS error +(for instance, if the server certificate is expired) +or return cached contents if available. + +The custom proxy is then responsible for connecting to the portal server +and retrieving updated contents from there, +locally caching it with any kind of expiry and refresh policy desired, +and processing cached resources when needed, +for instance by rewriting links from HTTPS to HTTP. + +The browser needs to be configured to reduce its own caching to a minimum, +since the smart proxy already does it. + +During the manifactuing process the proxy cache will be preloaded +with the resources hosted by the portal server. + +This is the most flexible approach. + +### Separatedly-maintained locally accessible copy of the portal contents + +Instead of having a locally running custom HTTP caching proxy, +the portal contents are stored as plain files on the system. +The browser will contain custom logic to load the local HTML file +instead of the portal URL when no Internet connection is available. + +A separate process will periodically compare the locally-stored HTML file and resources +against the portal server and refresh the local copy. + +This is the least flexible choice, +and the locally stored copies cannot be used as cache to speed up rendering +when the connection to the Internet is available. + +[Web Storage API]: https://html.spec.whatwg.org/multipage/webstorage.html +[Indexed Database API]: https://www.w3.org/TR/IndexedDB/ +[RFC7234]: https://tools.ietf.org/html/rfc7234 +[Offline Web applications]: https://html.spec.whatwg.org/multipage/browsers.html#offline +[Decoded/parsed object cache]: https://trac.webkit.org/browser/trunk/Source/WebKit2/Shared/CacheModel.cpp#L83 +[Resources cache]: https://trac.webkit.org/browser/trunk/Source/WebKit2/Shared/CacheModel.cpp#L158 +[Service Workers]: https://www.w3.org/TR/service-workers/ diff --git a/content/designs/x86-build-infrastructure.md b/content/designs/x86-build-infrastructure.md new file mode 100644 index 0000000000000000000000000000000000000000..1631af6d9f97f48a3bedfe6496daf7119b7786eb --- /dev/null +++ b/content/designs/x86-build-infrastructure.md @@ -0,0 +1,248 @@ +--- +title: Build infrastructure on Intel x86-64 +short-description: Hosting the build infrastructure on Intel x86-64-only providers +authors: + - name: Emanuele Aina +--- + +# Build infrastructure on Intel x86-64 + +## Introduction + +The current Apertis infrastructure is largely made of hosts based on the Intel +x86-64 architecture, often using virtualized machines. + +The only exceptions are: +* OBS workers used to build packages natively for the ARM 32 bit and + ARM 64 bit architectures +* LAVA workers, which match the [reference hardware + platforms](https://wiki.apertis.org/Reference_Hardware) + +While LAVA workers are by nature meant to be hosted separatedly from the rest +of the infrastructure and are handled via [geographically distributed LAVA +dispatchers](https://gitlab.apertis.org/infrastructure/apertis-lava-docker/blob/master/apertis-lava-dispatcher/README.md), +the constraint on the OBS workers is problematic for adopters that want to host +a downstream Apertis infrastructure. + +## Why hosting the whole build infrastructure on Intel x86-64 + +Being able to host the build infrastructure solely on Intel x86 64 bit +(usually referred to as `x86-64` or `amd64`) machines enables downstream +Apertis to be hosted on standard public or private cloud solution as these +usually only offer x86-64 machines. + +Deploying the OBS workers on cloud providers would also allow for implementing +elastic workload handling. + +Elastic scaling and the desire to ensure that the cloud approach is tested +and viable for dowstream mean that the deploying the approach described in +this document is of interest for the main Apertis infrastructure, not just +for downstreams. + +Some cloud provider like Amazon Web Services have recently started offering ARM +64 bit servers as well so it should be always possible to adopt an hybrid +approach mixing foreign builds on x86-64 and native ones on ARM machines. + +In particular Apertis is currently committed to maintain native workers for all +the supported architectures, aiming for a hybrid set up where foreign packages +get built on a mix of native and non-native Intel x86 64 bit machines. + +Downstreams will be able to opt for fully native, hybrid or Intel-only OBS +worker setups. + +## Why OBS workers need a native environment + +Development enviroment for embedded devices often rely on cross-compilation to +build software targeting a foreign architecture from x86-64 build hosts. + +However, pure cross-compilation prevents running the unit tests that are +shipped with the projects being built, since the binaries produced do not match +the current machine. + +In addition, supporting cross-compilation across all the projects that compose a +Linux distribution involves a considerable effort since not all build systems +support cross-compilation, and where it is supported some features may still be +incompatible with it. + +From the point of view of upstream projects, cross-compilation is in general +a less tested path, which often lead cross-building distributors to ship a +considerable amount of patches adding fixes and workarounds. + +For this reason all major package-based distributions like Fedora, Ubuntu, SUSE +and in particular Debian, the upstream distribution from which Apertis sources +most of its packages, choose to only officially support native compilation for +their packages. + +The Debian infrastructure thus hosts machines with different +CPU architectures, since build workers must run hardware that matches the +architecture of the binary package being built. + +Apertis inherits this requirements, and currently has build workers with +Intel 64 bit, ARM 32 and 64 bit CPUs. + +## CPU emulation + +Using the right CPU is fortunately not the only way to execute programs for +non-Intel architectures: the [QEMU project](https://www.qemu.org/) provides +the ability to emulate a multitude of platforms on a x86-64 machine. + +QEMU offers two main modes: +* system mode: emulates a full machine, including the CPU and a set of attached + hardware devices; +* user mode: translates CPU instructions on a running Linux system, running + foreign binaries as they where native. + +The system mode is useful when running entire operating systems, but it has a +severe performance impact. + +The user mode has a much lighter impact on performance as it only deals with +translating the CPU instructions in a Linux executable, for instance running +an ARMv7 ELF binary on top of the x86-64 kernel running on a x86-64 host. + +## Using emulation to target foreign architectures from x86-64 + +The build process on the OBS workers already involves setting up a chroot where +the actual compilation happens. By combining it with the static variant of the +QEMU user mode emulator it can be used to build software on a x86-64 host +targeting a foreign architectures as it were a native build. + +The [binfmt_misc](https://en.wikipedia.org/wiki/Binfmt_misc) subsystem in the +kernel can be used to make the emulation transparent so that emulation +happens automatically and transparently when a foreign binary is executed. +Packages can then be built for foreign architectures without any changes. + +The emulation-based compilation is also known as +[Type 4 cross-build](https://en.opensuse.org/openSUSE:Build_Service_Concept_CrossDevelopment#Types_of_crossbuild) +in the OBS documentation. + +The following diagram shows how the OBS backend can distribute build jobs to +its workers. + +Each CPU instruction set is marked by the codename used by OBS: +* `x86_64`: the Intel x86 64 bit ISA, also known as `amd64` in Debian +* `armv7hl`: the ARMv7 32 bit Hard Float ISA, also known as `armhf` in Debian +* `aarch64`: the ARMv8 64 bit ISA, also known as `arm64` in Debian + + + +Particularly relevant here are the `armv7hl` jobs building ARMv7 32 bit packages +that can be dispatched to: + +1. the native `armv7hl` worker machine; +1. the `aarch64` worker machine, which supports the ARMv7 32 bit ISA natively + and thus can run binaries in `armv7hl` chroots natively; +1. the `x86_64` worker machine, which uses the `qemu-arm-static` binary + translator to run binaries in `armv7hl` chroots via emulation. + +It's worth nothing that some ARM 64 bit server systems do not support the ARMv7 +32 bit ISA natively, and would thus require the same emulation-based approach +used on the x86-64 machines to execute the ARM 32 bit jobs. + +## Mitigating the impact on performance + +The most obvious way to handle the performance penalty is to use faster CPUs. +Cloud providers offer a wide range of options for x86-64 machines, and +establishing the appropriate cost/perfomance balance is the first step. +It is possible that the performance of an emulated build on a fast x86-64 CPU +may be comparable or even faster than a native build on a older ARMv7 machine. + +In addition, compilation is often a largely parallel task: +1. big software projects like WebKit are made of many compilation units that + can be built in parallel +2. during large scale rebuilds each package can be built in parallel + +Even if some phases of the build process do not benefit from multiple cores, +most of the time is spent on processing the compilation units which means that +increasing the numbers of cores on the worker machines can effectively mitigate +the slowdown due to emulation on large packages. + +For large scale rebuilds, scaling the number of machines is already helpful, +as the build process for each package is isolated from the others. + +A different optimization would be to use some selected binaries for the native +architecture during the qemu-linux-user emulation. For instance, a real +cross-compiler can be injected in the build chroot and make it pretend to be +the "native" compiler in the otherwise emulated environment. + +This would give the best possible performance as the compilation is done with +native `x86-64` code, but care has to be taken to ensure that the +cross-compiler can run reliably in the foreign chroot, and keeping the +native and emulated versions synchronized can be challenging. + +## Risks + +### Limited maturity of the support for cross-builds in OBS + +Support for injecting the QEMU static emulator in the OBS build chroot seems to +be only well tested on RPM-based systems, and there may be some issues with +the DEB-based approach used by Apertis. + +A feasibility study was done by Collabora in the past demonstrating the +viability of the approach, but some issues may need to be dealt with to deploy +it at scale. + +### Versioning mismatches between emulated and injected native components + +If native components are injected in the otherwise emulated cross-build +environment to mitigate the impact on performance, particular care must be made +to ensure that the versions match. + +### Impact of performance loss on timing-depended tests + +Some unit tests shipped in upstream packages can be very sensitive to timing +issues, failing on slower machines. If the performance impact is non-trivial, +the emulated environment may be subject to the same failures. + +However, this is not specific to the emulated environment: Apertis often faces +this kind of issues where some tests that pass on the main Apertis +infrastructure fail due to timing issues on the slower workers that downstream +distributions may use. + +To mitigate the impact on downstream distributors, the flaky tests usually get +fixed or, if the effort required is too large, disabled. + +### Emulation bugs + +The emulator may have bugs that may get triggered by the build process of some +packages. + +Since upstream distributors use native workers those issues may not be caught +before the triggering package is built on the Apertis infrastructure. + +Debugging this kind of issues is often not trivial. + +## Approach + +These are the high level steps to be undertaken to be able to run the whole +Apertis build infrastructure on x84-64 machines: + +* Set up an OBS test instance with a single `x86-64` worker +* Configure the test instance and worker for `armhf` and `aarch64` emulated builds +* Test a selected set of packages by building them for `armhf` and `aarch64` +* Set up other `x86-64` workers and test a rebuild of the whole archive, + ensuring that all the packages can be build from using the emulated approach +* Devise mitigations in case some packages fail to build in the emulated + environment +* Measure and evaluate performance impact comparing build times with those on + the native workers currently in use in Apertis, to decide whether scaling the + number of workers is sufficient to compensate the impact +* Test mitigation approaches over a selected set of packages and evaluate the gains +* Do another rebuild of the whole archive to ensure that the mitigations didn't + introduce regressions +* Refine and deploy the chosen mitigation approaches to, for instance, ensure + that the injected native binaries are kept synchronized with the emulated + ones they replace + +There's a risk that no mitigation end up being effective on some packages so +they keep failing in the emulated approach. In the short term those packages +will be required to be built on the native workers in a hybrid set up, but they +would be more problematic in a hypotetic downstream setup with no native +workers as they can't be built there. In that case, pre-built binaries coming +from an upstream with native workers will have to be injected in the archive. + +Alternatively, it may be possible to mix [type 3 and 4 +crossbuilds](https://en.opensuse.org/openSUSE:Build_Service_Concept_CrossDevelopment#Types_of_crossbuild) +by modifying the failing packages to make them buildable with a real cross-compiler. +This solution requires a much higher maintenance cost as packages do not +generally support being built in that way, but it may be an option to be able to +do full builds on x86-64 in the few cases where emulation fails.