How it works

Explanations on how some key components work or are structured.

Boris Lechner 2025-05-05 e022507882f1c7d53ec4dc72b08922261dfdd25f

Subsections of How it works

hermes-server

Explanations on how some key components of hermes-server work or are structured.

Boris Lechner 2025-05-05 e022507882f1c7d53ec4dc72b08922261dfdd25f

Subsections of hermes-server

Workflow

hermes-server

  • 1. loads its local cache
  • 2. checks if its dataschema has changed since last run, and emits the resulting removed events (if any), and the new dataschema
  • 3. fetches all data required by its datamodel from datasource(s)
    • 3.1. enforces merge constraints
    • 3.2. merges data
    • 3.3. replaces inconsistencies and merge conflict by cached values
    • 3.4. enforce integrity constraints
  • 4. generate a diff between its cache and the fetched remote data
  • 5. loop over each diff type: added, modified, removed
    • 5.1. for each diff type, loop over each data type in their declaration order in the datamodel, except for removed diff type, for which it is the reverse declaration order
      • 5.1.1. loop over each diff item of current data type
        • 5.1.1.1. generate the corresponding event
        • 5.1.1.2. emit the event on message bus
        • 5.1.1.3. if event was successfully emitted:
          • 5.1.1.3.1. run datamodel commit_one action if any
          • 5.1.1.3.2. update the cache to reflect the new value of the item affected by event
  • 6. once all events have been emitted
    • 6.1. run datamodel commit_all action if any
    • 6.2. save cache on disk
  • 7. wait for updateInterval and restart from step 3. if app has not been requested to stop

If any exception is raised in step 2., this step is restarted until it succeeds.

If any exception is raised in steps 3. to 7., the cache is saved on disk, and the server restart from step 3..

Boris Lechner 2025-05-05 e022507882f1c7d53ec4dc72b08922261dfdd25f

Integrity constraints

Hermes-server can handle several data types, with link (foreign keys) between them, and to enforce integrity constraints.

Let’s use a typical Users / Groups / GroupsMember use case to illustrate this.

classDiagram
    direction BT
    GroupsMembers <-- Users
    GroupsMembers <-- Groups
    class Users{
      user_id
      ...
    }
    class Groups{
      group_id
      ...
    }
    class GroupsMembers{
      user_id
      group_id
      integrity() _SELF.user_id in Users_pkeys and _SELF.group_id in Groups_pkeys
    }

In this scenario, entries in GroupsMembers that have a user_id that doesn’t exist in Users, or a group_id that doesn’t exist in Groups will be silently ignored.

For more details, please see integrity_constraints in hermes-server configuration.

Boris Lechner 2025-05-05 e022507882f1c7d53ec4dc72b08922261dfdd25f

Multi source data aggregation

In a multi-source scenario, Hermes can aggregate entries providing from multiple sources as if they were providing from one, and optionally enforce aggregation constraints to ensure data consistency.

Let’s take a use case, with a university data set where Hermes should manage user accounts. Employees and students data are stored on two separate data sources. Hermes will be able to merge the two datasources in one virtual Users, but must ensure that primary keys doesn’t collide.

Here we got two distinct data sources for a same data type.

classDiagram
    direction BT
    Users <|-- Users_employee
    Users <|-- Users_students
    class Users{
      user_id
      login
      mail
      merge_constraints() s.user_id mustNotExist in e.user_id
    }
    class Users_students{
      s.user_id
      s.login
      s.mail
    }
    class Users_employee{
      e.user_id
      e.login
      e.mail
    }

In this scenario, entries in Users_students that have a user_id that exist in Users_employee will be silently ignored.
But entries in Users_employee that have a user_id that exist in Users_students will still be processed.

For more details, please see pkey_merge_constraint and merge_constraints in hermes-server configuration.

Boris Lechner 2025-05-05 e022507882f1c7d53ec4dc72b08922261dfdd25f

Multi source data merging

In a multi-source scenario, Hermes can recompose entries providing from multiple sources by merging their data, and optionally setting merge constraints to ensure data consistency.

Let’s take a use case, where Hermes should manage user accounts. Main data and wifi profile name are stored on two separate data sources. Hermes will be able to aggregate the two datasources in one virtual Users, but must ensure that primary keys of second exists in first.

Here we got two distinct data sources for a same entry.

classDiagram
    direction BT
    Users <|-- Users_main
    Users <|-- Users_auxiliary
    class Users{
      user_id
      login
      mail
      wifi_profile
      merge_constraints() a.user_id mustAlreadyExist in m.user_id
    }
    class Users_auxiliary{
      a.user_id
      a.wifi_profile
    }
    class Users_main{
      m.user_id
      m.login
      m.mail
    }

In this scenario, entries in Users_auxiliary that have a user_id that doesn’t exist in Users_main will be silently ignored.
But entries in Users_main that have a user_id that doesn’t exist in Users_auxiliary will be processed, and therefore the resulting Users entry won’t have a wifi_profile value.

For more details, please see pkey_merge_constraint and merge_constraints in hermes-server configuration.

Boris Lechner 2025-05-05 e022507882f1c7d53ec4dc72b08922261dfdd25f

Events emitted

Event categories

An event always belongs to one of those categories:

  • base: standard event, can be of type:

    • dataschema: propagate the new dataschema to clients, after a server datamodel update
    • added: a new entry has been added to specified data type, with specified attributes and values
    • removed: entry of specified pkey has been removed from specified data type
    • modified: entry of specified pkey has been modified. Contains only added, modified, and removed attributes with their new values
  • initsync: indicate that the event is a part of an initsync sequence, can be of type:

    • init-start: beginning of an initsync sequence, also contains the current dataschema
    • added: a new entry has been added to specified data type, with specified attributes and values. As the server sends the contents of its cache to initialize clients, entries can only be added
    • init-stop: end of an initsync sequence

Boris Lechner 2025-05-05 e022507882f1c7d53ec4dc72b08922261dfdd25f

Cache files

_hermes-server.json

Contains state of the server:

  • lastUpdate: datetime.datetime | None

    Datetime of latest update.

  • errors: dict[str, dict[str, dict[str, Any]]]

    Dictionary containing current errors, to be able to notify of any changes.

  • exception: str | None

    String containing latest exception trace.

_dataschema.json

Contains the Dataschema, built upon the Datamodel. This cache file permit to server to process step 2. from Workflow.

DataType.json

There’s one file per data type declared in Datamodel, containing the data cache of this data type, as a list of dict. Each dict from the list is an entry.

Boris Lechner 2025-05-05 e022507882f1c7d53ec4dc72b08922261dfdd25f

hermes-client

Explanations on how some key components of hermes-client work or are structured.

Boris Lechner 2025-05-05 e022507882f1c7d53ec4dc72b08922261dfdd25f

Subsections of hermes-client

Workflow

hermes-client

  • 1. loads its datamodel from config file
  • 2. if it exists, loads previous datamodel from cache
  • 3. if any, notify about datamodel warnings: remote type and remote attributes present in datamodel, but not in dataschema
  • 4. if a remote schema exists in cache, load error queue from cache
  • 5. if client has not been initialized yet (no complete initSync sequence has been processed yet):
    • 5.1. process initSync sequence, if a complete initSync sequence is available on message bus
    • 5.2. restart from step 5.
  • 6. if client has already been initialized yet (a complete initSync sequence has already been processed):
    • 6.1. if it is the first iteration of loop (step 7. has never been reached):
      • 6.1.1. if datamodel in config differs from cached one, process the datamodel update:
        • 6.1.1.1 generate removed events for all entries of removed data types, process them, and purge those data type cache files
        • 6.1.1.2 generate a diff between cached data built upon previous datamodel, and the same data converted to new datamodel, and generate corresponding events and process them
    • 6.2. if errorQueue_retryInterval has passed since the last attempt, retry to process events in error queue
    • 6.3. if trashbin_purgeInterval has passed since the last attempt, retry to purge expired objects from trashbin
    • 6.4. loop over all events available on message bus, and process each one to call its corresponding handler when it exists in client plugin
  • 7. when at least an event was processed or if app was requested to stop:
    • 7.1. save cache files of error queue, app, data
    • 7.2. call special handler onSave when it exists in client plugin
    • 7.3. notify any change in error queue
  • 8. restart from step 5. if app hasn’t been requested to stop

If any exception is raised in step 6.1.1, it will be considered as a fatal error, notified, and the client will stop.

If any exception is raised in steps 5. to 6., it is notified, its event is added to error queue and the client restarts from step 7..

Boris Lechner 2025-05-05 e022507882f1c7d53ec4dc72b08922261dfdd25f

Event processing

As the datamodel on server differs than that on client, the clients must convert remote events received on message bus to local events. If the resulting local event is empty (the data type or the attributes changed in remote event are not set on client datamodel), the event is ignored.

On client datamodel update, the client may generate local events that have no corresponding remote event, i.e. to update an attribute value computed with a Jinja template that just had been updated.

flowchart TB
  subgraph Hermes-client
    direction TB
    datamodelUpdate[["a datamodel update"]]
    remoteevent["Remote event"]
    localevent["Local event"]
    eventHandler(["Client plugin event handler"])
  end
  datamodelUpdate-->|generate|localevent
  MessageBus-->|produce|remoteevent
  remoteevent-->|convert to|localevent
  localevent-->|pass to appropriate|eventHandler
  eventHandler-->|process|Target

  classDef external fill:#fafafa,stroke-dasharray: 5 5
  class MessageBus,Target external

Boris Lechner 2025-05-05 e022507882f1c7d53ec4dc72b08922261dfdd25f

Foreign keys

Sometimes, objects are linked together by foreign keys. When an error occurs on an object whose primary key refers to that of one or more other “parent” objects, it may be desirable to interrupt the processing of all or part of the events of these parent objects until this first event has been correctly processed. This can be done by adding the events of the parent objects to the error queue instead of trying to process them.

The first thing to do is to declare the foreign keys through hermes-server.datamodel.data-type-name.foreignkeys in hermes-server configuration. The server will do nothing with these foreign keys except propagate them to the clients.

Then, it is necessary to establish which policy to apply to the clients through hermes-client.foreignkeys_policy in each hermes-client configuration. There are three:

  • disabled: No event, policy is disabled. Probably not relevant in most cases, but could perhaps be useful to someone one day.
  • on_remove_event: Only on removed events. Should be enough in most cases.
  • on_every_event: On every event types (added, modified, removed). To ensure perfect consistency no matter what.

Boris Lechner 2025-05-05 e022507882f1c7d53ec4dc72b08922261dfdd25f

Auto remediation

Sometimes, an event may be stored in error queue due to a data problem (e.g. a group name with a trailing dot will raise an error on Active Directory). If the trailing dot is then removed from the group name on datasource, the modified event will be stored on error queue, and won’t be processed until previous one is processed, which cannot happen without proceeding to a risky and undesirable operation: manually editing client cache file.

The autoremediation solves this type of problems by merging events of a same object in error queue. It is not enabled by default, as it may break the regular processing order of events.

Example

Let’s take an example with a group created with an invalid name. As its name is invalid, its processing will fail, and the event will be stored in error queue like this:

flowchart TB
  subgraph errorqueue [Error queue]
    direction TB
    ev1
  end

  ev1["`**event 1**
    &nbsp;
    *eventtype*: added
    *objType*: ADGroup
    *objpkey*: 42
    *objattrs*: {
    &nbsp;&nbsp;grp_pkey: 42
    &nbsp;&nbsp;name: 'InvalidName.'
    &nbsp;&nbsp;desc: 'Demo group'
    }`"]

  classDef leftalign text-align:left
  class ev1 leftalign

As the error has been notified, someone corrects the group name in the datasource. This change will conduce to an according modified event. This modified event will not be processed, but added to the error queue as its object already has an event in error queue.

  • without autoremediation, until the first event has been successfully processed, the second one will not even be tried. The fix is stuck.
  • with autoremediation, the error queue will merge the two events, and then on the next processing of error queue, the updated event will be successfully processed.
flowchart TB
  subgraph errorqueuebis [With autoremediation]
    direction TB
    ev1bis
  end

  subgraph errorqueue [Without autoremediation]
    direction TB
    ev1
    ev2
  end

  ev1["`**event 1**
    &nbsp;
    *eventtype*: added
    *objType*: ADGroup
    *objpkey*: 42
    *objattrs*: {
    &nbsp;&nbsp;grp_pkey: 42
    &nbsp;&nbsp;name: 'InvalidName.'
    &nbsp;&nbsp;desc: 'Demo group'
    }`"]

  ev2["`**event 2**
    &nbsp;
    *eventtype*: modified
    *objType*: ADGroup
    *objpkey*: 42
    *objattrs*: {
    &nbsp;&nbsp;modified: {
    &nbsp;&nbsp;&nbsp;&nbsp;name: 'ValidName'
    &nbsp;&nbsp;}
    }`"]

  ev1bis["`**event 1**
    &nbsp;
    *eventtype*: added
    *objType*: ADGroup
    *objpkey*: 42
    *objattrs*: {
    &nbsp;&nbsp;grp_pkey: 42
    &nbsp;&nbsp;name: 'ValidName'
    &nbsp;&nbsp;desc: 'Demo group'
    }`"]

  classDef leftalign text-align:left
  class ev1,ev2,ev1bis leftalign

Boris Lechner 2025-05-05 e022507882f1c7d53ec4dc72b08922261dfdd25f

Cache files

_hermes-client-name.json

Contains state of the client:

  • queueErrors: dict[str, str]

    Dictionary containing all error messages of objects in error queue, to be able to notify of any changes.

  • datamodelWarnings: dict[str, dict[str, dict[str, Any]]]

    Dictionary containing current datamodel warnings, for notifications.

  • exception: str | None

    String containing latest exception trace.

  • initstartoffset: Any | None

    Contains the offset of the first message of initSync sequence on message bus.

  • initstopoffset: Any | None

    Contains the offset of the last message of initSync sequence on message bus.

  • nextoffset: Any | None

    Contains the offset of the next message to process on message bus.

_hermesconfig.json

Cache of previous config, used to be able to build the previous datamodel and to render the Jinja templates with Attribute plugins.

_dataschema.json

Cache of latest Dataschema, received from hermes-server.

_errorqueue.json

Cache of error queue.

RemoteDataType.json

One file per remote data type, containing all remote entries, as they had been successfully processed.

When error queue is empty, must have the same content than RemoteDataType_complete__.json

RemoteDataType_complete__.json

One file per remote data type, containing all remote entries, as they should be without error.

When error queue is empty, must have the same content than RemoteDataType.json

trashbin_RemoteDataType.json

Only if trashbin is enabled. One file per remote data type, containing all remote entries that are in trashbin, as they had been successfully processed.

When error queue is empty, must have the same content than trashbin_RemoteDataType_complete__.json

trashbin_RemoteDataType_complete__.json

Only if trashbin is enabled. One file per remote data type, containing all remote entries that are in trashbin, as they should be without error.

When error queue is empty, must have the same content than trashbin_RemoteDataType.json

__LocalDataType.json

One file per local data type, containing all local entries, as they had been successfully processed.

When error queue is empty, must have the same content than __LocalDataType_complete__.json

__LocalDataType_complete__.json

One file per local data type, containing all local entries, as they should be without error.

When error queue is empty, must have the same content than __LocalDataType.json

__trashbin_LocalDataType.json

Only if trashbin is enabled. One file per local data type, containing all local entries that are in trashbin, as they had been successfully processed.

When error queue is empty, must have the same content than __trashbin_LocalDataType_complete__.json

__trashbin_LocalDataType_complete__.json

Only if trashbin is enabled. One file per local data type, containing all local entries that are in trashbin, as they should be without error.

When error queue is empty, must have the same content than __trashbin_LocalDataType.json