Managing topics and access¶
Warning
This feature applies only to Aiven hosted Kafka. On-premises Kafka is deprecated, and creating new topics on-premises was disabled summer 2021. For on-premises Kafka, see on-premises Kafka documentation.
Creating topics and defining access¶
Creating or modifying a Topic
Kubernetes resource will trigger topic creation and ACL management with Aiven (hosted Kafka provider). The topic name will be prefixed with your team namespace, thus in the example below, the fully qualified topic name will be myteam.mytopic
. This name will be set in the .status.fullyQualifiedName
field on your Topic resource once the Topic is synchronized to Aiven.
To add access to this topic for your application, see the next section: Accessing topics from an application.
Topic resources can only be specified in GCP clusters. However, applications might access topics from any cluster, including on-premises. For details, read the next section.
Currently, use the nav-dev
pool for development, and nav-prod
for production.
If you need cross-environment communications, use the nav-infrastructure
pool, but please consult the NAIS team before you do.
Pool | Min. replication | Max. replication | Topic declared in | Available from |
---|---|---|---|---|
nav-dev |
2 | 3 | dev-gcp |
dev-gcp , dev-fss |
nav-prod |
2 | 9 | prod-gcp |
prod-gcp , prod-fss |
nav-infrastructure |
2 | 3 | prod-gcp |
dev-gcp , dev-fss , prod-gcp , prod-fss |
ACLs¶
On your topic, you must define ACLs to manage access to the topic.
Access is granted to applications, belonging to teams.
Every ACL must contain team, application and which access to grant.
Possible access is read
, write
and readwrite
.
It is possible to use simple wildcards (*
) in both team and application names, which matches any character any number of times.
Be aware that due to the way ACLs are generated and length limits, the ends of long names can be cut, eliminating any wildcards at the end.
---
apiVersion: kafka.nais.io/v1
kind: Topic
metadata:
name: mytopic
namespace: myteam
labels:
team: myteam
spec:
pool: nav-dev
acl:
- team: myteam
application: ownerapp
access: readwrite # read, write, readwrite
- team: bigteam
application: consumerapp1
access: read
- team: bigteam
application: consumerapp2
access: read
- team: bigteam
application: producerapp1
access: write
- team: producerteam
application: producerapp
access: write
- team: trusted-team
application: *
access: read # All applications from this trusted-team has read
- team: *
application: aivia
access: read # Applications named aivia from any team has read
- team: myteam
application: rapid-*
access: readwrite # Applications from myteam with names starting with `rapid-` has readwrite access
Configuration¶
Topics may be configured beyond the default settings for various use cases. Only a subset of all the possible topic-level configurations are available.
---
apiVersion: kafka.nais.io/v1
kind: Topic
metadata:
name: mytopic
namespace: myteam
labels:
team: myteam
spec:
pool: nav-dev
config: # optional; all fields are optional too; defaults shown
cleanupPolicy: delete # delete, compact, compact,delete
maxMessageBytes: 1048588 # 1 MiB
minimumInSyncReplicas: 2
partitions: 1
replication: 3 # see min/max requirements
retentionBytes: -1 # -1 means unlimited
retentionHours: 168 # -1 means unlimited
segmentHours: 168 # 1 week
acl:
- team: myteam
application: ownerapp
access: readwrite
Maximum message size¶
The maxMessageBytes
configuration controls the largest record batch size allowed by Kafka.
It has a default value of 1048588
(1 MiB) and a maximum value of 5242880
(5 MiB).
Danger
Generally speaking, Kafka is not designed to handle large messages.
We recommend that you do not increase the maxMessageBytes
value above the defaults, unless absolutely necessary.
To keep your Kafka messages below the size limit, do consider implementing strategies such as:
- using an efficient serialization format such as Avro or Protobuf
- using compression - set the
compression.type
configuration for your producer(s) - using patterns such as claim-checks or splitting messages into multiple segments
If you do increase the maxMessageBytes
value, you will need to configure all your producers and consumers as well to accommodate this.
For producers:
- set
max.request.size
equal tomaxMessageBytes
For consumers:
- set
max.partition.fetch.bytes
equal tomaxMessageBytes
Segment rolling¶
Each topic partition is split into segments. The segmentHours
configuration controls the period of time after which
Kafka will commit the segment even if not full to ensure that retention can delete or compact old data.
It has a default value of 168
(1 week) and a minimum value of 1
(1 hour).
Setting this value lower can be useful for GDPR purposes where you need to compact or delete data more regularly than the default setting of 1 week.
Data catalog metadata¶
If your topic exposes data meant for consumption by a wider audience, you should define some metadata describing the topic and its contents. This data will be automatically scraped and added to the internal data catalog. If the catalog
key is set to public
, the topic metadata is also published to the external data catalog and the National Data Catalog.
Syntax:
Use the following annotations and prefix them with dcat.data.nav.no/
. Default values will be used where not supplied.
Key | Importance | Comment | Example Value | |
---|---|---|---|---|
title | mandatory | String | Inntektskjema mottatt fra Altinn | topic name |
description | mandatory | String | Inntektsmeldingen arbeidsgiveren sender fra eget lønns- og personalsystem eller fra altinn.no | |
theme | recommended | A main category of the resource. A resource can have multiple themes entered as a comma-separated list of strings. | inntekt | |
keyword | recommended | A string or a list of strings | inntekt,arbeidsgiver,altinn |
One or more of the following keys can also be supplied if the default values below are not sufficient:
Key | Importance | Comment | Example Value | Default value |
---|---|---|---|---|
temporal | optional | An interval of time covered by the topic, start and end date. Formatted as two ISO 8601 dates (or datetimes) separated by a slash. | 2020/2020 or 2020-06/2020-06 | current year/current year |
language | optional | Two or three letter code. | NO | NO |
creator | optional | The entity responsible for producing the topic. An agent (eg. person, group, software or physical artifact). | NAV | team name |
publisher | optional | The entity responsible for making the topic available. An agent (eg. person, group, software or physical artifact). | NAV | NAV |
accessRights | optional | Information about who can access the topic or an indication of its security status. | internal | internal |
license | optional | Either a license URI or a title. | MIT | |
rights | optional | A statement that concerns all rights not addressed with license or accessRights , such as copyright statements. |
Copyright 2020, NAV | Copyright year, NAV |
catalog | optional | The catalog(s) where the metadata will be published. The value can be either internal (only visibible within the organization) or public . |
public | internal |
Permanently deleting topic and data¶
Warning
Permanent deletes are irreversible. Enable this feature only as a step to completely remove your data.
When a Topic
resource is deleted from a Kubernetes cluster, the Kafka topic is still retained, and the data kept intact. If you need to remove data and start from scratch, you must add the following annotation to your Topic
resource:
When this annotation is in place, deleting the topic resource from Kubernetes will also delete the Kafka topic and all of its data.
Accessing topics from an application¶
Adding .kafka.pool
to your Application
spec will inject Kafka credentials into your pod. Your application needs to follow some design guidelines; see the next section on application design guidelines. Make sure that the topic name matches the fullyQualifiedName
found in the Topic resource, e.g. myteam.mytopic
.
Created: 2021-09-15