No internet connection
  1. Home
  2. Development

K8s, Swarm, Traefik and Talkyard

By KajMagnus @KajMagnus2018-11-01 10:11:27.960Z

It'd be good if Talkyard was simple to configure together with Kubernetes and Docker Swarm. And Traefik or Caddyserver (proxy servers with automatic HTTPS).

Maybe these could be the first steps: (or "all" that's needed?)

  • Provide a sample Docker-Compose file with custom networking and volume containers, which one can copy and edit and append to one's own Docker-Compose or Swarm configuration.
  • This Docker-Compose sample file should assume that there's a reverse proxy somewhere (like, Traefik) that provides HTTPS. (Right?)
  • Figure out some way to take regular backups, although in Swarm and K8S, there's no Cron job on any host node that can do that.
  • Is there any good way to automatically upgrade the Talkyard images & container, when a new version is available?
  • Make it simpler to build Talkyard images — so people can fix Swarm / K8s related issues, and build and test, without running into confusion. (Here's a topic about that.)
  • 28 replies

There are 28 replies. Estimated reading time: 54 minutes

  1. G
    Jerry Koerkenmeier @gkoerk2018-11-01 10:54:49.769Z

    FULL DISCLOSURE: I run a Docker Swarm and would LOVE to get Talkyard working in a stack.

    Provide a sample Docker-Compose file with custom networking and volume containers, which one can copy and edit and append to one's own Docker-Compose or Swarm configuration.

    YES!

    This Docker-Compose sample file should assume that there's a reverse proxy somewhere (like, Traefik) that provides HTTPS. (Right?)

    YES!

    Figure out some way to take regular backups, although in Swarm and K8S, there's no Cron job on any host node that can do that.

    Yes. I have a very good way of handling this by merely adding a second, nearly identical database image (which will also contain client) to connect to the other DB in the stack and use the relevant backup command (mysqldump, pgdump, mongodump, etc.) that sleeps for a defined interval (between backups) and only retains backups for N days (which gives you full coverage as long as you backup the filesystem location where the backup files are stored.

    I'm not well suited to answer the last two points, but I think they do belong in the list.

    Is there any good way to automatically upgrade the Talkyard images & container, when a new version is available?

    Ideally I think this would be handled by an "automated build" in the docker repo which is triggered by changes in Github. Usually providing a new version of the images and updating the :latest tag to point to new images means all end users should need to do is a docker stack deploy to have the image updated. One key in making this simple is to expose all the static, required-for-restore data in a volume the user binds to either a docker named volume or (preferably to me) a location on the (shared) storage using a bind mount. This way, the container itself is ephemoral. Just stop the old image and run the new one.

    BTW, Here's a link on a possibly better and newer way to handle: Docler Deploy Webhook whereby you could provide instructions on automating new builds all the way into the swarm itself requiring no manual effort at all.

    1. KajMagnus @KajMagnus2018-11-05 09:42:22.396Z

      Ok :- ) I'll read about Swarm and services, start creating a sample Docker-Compose file, and, later when I've experimented a bit myself and know & understand all this a bit more, I'll reply to the things you wrote.

      1. GJerry Koerkenmeier @gkoerk2018-11-05 23:45:35.694Z

        Want an example of a swarm .yml file with networking and traefik labels?

        1. KajMagnus @KajMagnus2018-11-06 03:46:57.481Z

          Yes that'd be helpful

          1. GJerry Koerkenmeier @gkoerk2018-11-06 18:04:17.111Z2018-11-06 18:14:36.676Z

            First example is Nextcloud (nextcloud.yml) - Will install the first time with docker stack deploy nextcloud -c /[path/to/nextcloud.yml and bring you to an install screen.

            version: "3.0"
            
            services:
              
              nextcloud:
                image: nextcloud:latest
                env_file: /share/appdata/config/nextcloud/nextcloud.env
                networks:
                  - internal
                  - traefik_public
                depends_on: 
                  - db
                deploy:
                  labels:
                    - traefik.frontend.rule=Host:nextcloud.gkoerk.com
                    - traefik.docker.network=traefik_public
                    - traefik.port=80
                volumes:
                  - /share/appdata/nextcloud:/var/www/html
                  - /share/appdata/nextcloud/apps:/var/www/html/custom_apps
                  - /share/appdata/nextcloud/config:/var/www/html/config
                  - /share/appdata/nextcloud/data:/var/www/html/data
            
              db:
                image: mariadb:10
                env_file: /share/appdata/config/nextcloud/nextcloud.env
                networks:
                  - internal
                volumes:
                  - /share/runtime/nextcloud/db:/var/lib/mysql
            
              db-backup:
                image: mariadb:10
                env_file: /share/appdata/config/nextcloud/nextcloud-db-backup.env
                depends_on:
                  - db
                volumes:
                  - /share/appdata/nextcloud/database-dump:/dump
                  - /etc/localtime:/etc/localtime:ro
                entrypoint: |
                  bash -c 'bash -s <<EOF
                  trap "break;exit" SIGHUP SIGINT SIGTERM
                  sleep 2m
                  while /bin/true; do
                    mysqldump -h db --all-databases | gzip -c > /dump/dump_\`date +%d-%m-%Y"_"%H_%M_%S\`.sql.gz
                    (ls -t /dump/dump*.sql.gz|head -n $$BACKUP_NUM_KEEP;ls /dump/dump*.sql.gz)|sort|uniq -u|xargs rm -- {}
                    sleep $$BACKUP_FREQUENCY
                  done
                  EOF'
                networks:
                - internal
            
              redis:
                image: redis:alpine
                depends_on:
                  - nextcloud
                networks:
                  - internal
                volumes:
                  - /share/runtime/nextcloud/redis:/data
            
              solr:
                image: solr:6-alpine
                depends_on:
                  - nextcloud    
                networks:
                  - internal
                volumes:
                - /share/runtime/nextcloud/solr:/opt/solr/server/solr/mycores
                entrypoint:
                  - docker-entrypoint.sh
                  - solr-precreate
                  - nextant
            
              cron:
                image: nextcloud
                volumes:
                  - /share/appdata/nextcloud:/var/www/html
                depends_on:
                  - nextcloud
                user: www-data
                networks:
                  - internal
                entrypoint: |
                  bash -c 'bash -s <<EOF
                    trap "break;exit" SIGHUP SIGINT SIGTERM
                    while [ ! -f /var/www/html/config/config.php ]; do
                      sleep 1
                    done
                    while true; do
                      php -f /var/www/html/cron.php
                      sleep 15m
                    done
                  EOF'
            
            networks:
              traefik_public:
                external: true
              internal:
                driver: overlay
                ipam:
                  config:
                    - subnet: 172.16.254.0/24
            

            The definition of the "traefik_public" network is external and created via docker network create --driver=overlay --subnet=172.1.1.0/21 --attachable traefik_public

            You can see that enabling traefik (once it's already running) for these containers is as simple as giving them appropriate LABEL values for Traefik to interpret. NOTE - In regular docker-compose, the labels are applied at the same level as the networks:, environment:, etc. but is placed under the deploy: section.

            The key is that while all your individual containers can speak to one another easily (since they are on the same overlay network -- named "internal"), they cannot speak to traefik unless they are on the same network it is. So we've added significant security to our stack by keeping every other service shielded from direct Internet access.

            1. GJerry Koerkenmeier @gkoerk2018-11-06 23:40:00.069Z

              I can also share my Traefik config if you like.

              1. KajMagnus @KajMagnus2018-11-07 04:32:36.111Z

                Yes please

                1. GJerry Koerkenmeier @gkoerk2018-11-07 13:01:38.960Z

                  On it. By the way - does the talkyard-web docker image (which runs NGINX) configured to serve as an SSL termination point, or does it act as a passthrough proxy? I think that's my only issue right now. I have it running in a VM, and Traefik proxying for the VM, but then I get errors. Can I bypass NGINX and point Traefik directly to talkyard-app (maybe would need to expose a port)? If so, which port? Or else, how can I change the default NGINX config so that conf/sites-enabled-manual/talkyard-servers.conf will serve as a pass-through proxy only? I think I would need to change these settings, maybe some are in the image at /etc/nginx/*.conf?

                  server {
                    listen 80      backlog=8192;   # about backlog: see above [BACKLGSZ]
                    # Using ipv6 here, can prevent Nginx from starting, if the host OS has disabled ipv6,
                    # Nginx then won't start and says:
                    #    [emerg] socket() [::]:80 failed (97: Address family not supported by protocol)
                    #listen [::]:80 backlog=8192;
                  
                    server_name _;
                  
                    ## To redirect to HTTPS, comment out these includes, and comment in "return 302 ..." below.
                    include /etc/nginx/server-limits.conf;
                    include /etc/nginx/server-locations.conf;
                  
                  1. KajMagnus @KajMagnus2018-11-09 15:32:12.624Z

                    Nginx does some different things: Rate & bandwidth limiting. Caching and serving uploaded files & assets. Websocket / long polling.

                    And, optionally, terminates SSL/TLS — and I think I'd like to move the TLS things to Traefik (in a Docker container), mainly because Traefkik has automatic HTTPS. People who already have their own reverse proxy, could then comment out Talkyard's Traefik container, in Talkyard's docker-compose/stack.yml file.

                    I would expect removing Nginx to result in weird things. ... Hmm, what are the errors you encounter, with Nginx?

                    Which docker-compose.yml file did you base your installation on? The one in https://github.com/debiki/talkyard is intended for development and just for building prod images. Maybe I should document this better. There's another Compose-file intended for production, here: https://github.com/debiki/talkyard-prod-one/blob/master/docker-compose.yml.

                    (And I have in mind to add a docker-stack.yml file too, for people who want to use Swarm and maybe build their own images)

    2. Progress
      with doing this idea
    3. @KajMagnus marked this topic as Planned 2018-11-05 09:42:25.776Z.
    4. @KajMagnus marked this topic as Started 2019-01-23 08:02:18.008Z.
    5. KajMagnus @KajMagnus2019-03-06 13:06:06.396Z2019-03-07 04:21:58.675Z

      Hi @gkoerk @mcku, if you're still interested in Talkyard + Docker-Compose / Swarm. And, Hi, others:

      I've created a docker-compose.yml file, and instructions, for integrating Talkyard into one's already existing Docker-Compose stack, together with Traefik or some other reverse proxy. Swarm should be similar — I think the difference is that, with Swarm, one needs to run envsubst to insert one's custom settings, since there's no Docker-Compose .env file to edit.

      See this Git repo: https://github.com/debiki/talkyard-prod-swarm

      How it works

      1. You clone the repo, configure settings (hostname, memory, email server).
      2. git pull to download the most recent docker-compose.yml file, which references the most recent Talkyard version.
      3. And then: docker-compose up.

      Also, edit your reverse proxy so it'll forward traffic to Talkyard. Included in the talkyard-prod-swarm repo, is an example of how to do this, with Traefik.

      Upgrading to new Talkyard versions

      Do git pull to get a new docker-compose.yml file with new Talkyard version numbers. And then docker-compose down ; docker-compose up to download new images and restart. This approach — with git pull:ing a completely new docker-compose.yml file, lets us (the people who build Talkyard) make major changes to the tech stack (like, adding a new backend server, or changing from ElasticSearch to Toshi, just an example), without you having to even know about this.

      Backups

      In the future, probably we'll add a Docker container that makes daily / hourly backups, and saves in a Docker volume of your choice, which you can backup off-site.

      Your custom images

      You can edit .env and use your own DOCKER_REPOSITORY=... and your own TALKYARD_VERSION_TAG=... if you build your own images and versions.

      1. H
        @Huxy2019-03-28 19:17:43.872Zreplies toKajMagnus:

        Hi @KajMagnus, hope you're well!

        I've managed to get Talkyard up and running on my swarm and I'm left with a couple of questions I'm hoping you can answer. I'm still very new to swarm and k8s but I was going to use persistent volumes to maintain data integrity across nodes. In effect this would allow me to move services between the hosts with little to no downtime. Looking at the backup scripts from my dedicated instance, I can see that the cache is backed up. What would happen if the cache wasn't backed up? Is it integral to the application or will Talkyard just rebuild it?

        To summarise: If I run persistent volumes for uploads and the db data, will that be enough to provide a working Talkyard instance?

        Also, any chance your talkyard app docker image could support secrets. It would save me from having to add them as plaintexts?

        1. KajMagnus @KajMagnus2019-03-29 12:20:28.453Z2019-03-29 12:44:26.294Zreplies toHuxy:

          Hi @Huxy nice hearing from you. Actually I have a cold otherwise I'm well :- )

          it's fine to not backup the cache. It's just Redis, and everything gets repopulated, after restart, as needed.

          persistent volumes for uploads and the db data, will that be enough

          Yes, well, you'd like to backup the Play Framework and Nginx config files too, and, optionally, any LetsEncryt HTTPS certificate.

          Since you use custom volumes, probably you have written your own backup scripts? I'd say you'll need to test reinstalling everything from backup, on a 2nd server to know for sure if the backups work :- )

          B.t.w. can I ask what kind of persistent volumes do you use? And, would it be ok if I had a look at the backup scripts? (Maybe they can give me ideas about how to improve the default backup scripts.)

          About secrets: I probably won't have time to look into that, the nearest time. Seems like a good thing to support, though, some time later when I'll re-visit Docker Swarm. (Or if you'd like to see if you can get it working and send a PR. However I think thad take lots of time and frustration.)

          1. H
            @Huxy2019-03-29 13:58:55.300Zreplies toKajMagnus:

            I have a cold as well at the moment, so I feel your pain!
            Thanks for the reply; this was what I was expecting, It got confusing as there's quite a lot of documentation for clustering Redis over multiple nodes, but my understanding it was designed as a memory-only store and should be dispensable.

            Yes, well, you'd like to backup the Play Framework and Nginx config files too, and, optionally, any LetsEncryt HTTPS certificate.

            I don't need to worry about the LetsEncrypt certs as I'm using Traefik in my Swarm. I just add the relevant tags and overlay network and it's good to go. I'm also storing the play framework configuration (using configs) in the Swarm directly and passing it through to the container. They work like secrets but aren't encrypted at rest and are less secure. It means any node in the swarm can access the same config file.

            Since you use custom volumes, probably you have written your own backup scripts? I'd say you'll need to test reinstalling everything from backup, on a 2nd server to know for sure if the backups work :- )

            Well, I haven't written anything quite yet, this is what I'm working on at the moment. I originally looked at ways of maintaining HA and replicas of the data set. I think an ideal setup would be:

            • Master PostgreSQL with streaming to replica slave PostgreSQL servers
            • Redis cluster with local volumes
            • Shared volume for uploads i.e. s3 bucket or nfs
            • Nginx and App instances

            This would provide availability but would still require backups of data to another location. However, it became clear that PostgreSQL can not handle fallover automatically and the more I looked in to it, the further down the rabbit hole it went.

            B.t.w. can I ask what kind of persistent volumes do you use? And, would it be ok if I had a look at the backup scripts? (Maybe they can give me ideas about how to improve the default backup scripts.)

            In the end I've decided to use a persistent storage solution; where the database data is replicated between multiple nodes. Then if the server goes down, you just spin up a new one on a node that has access to the shared storage. I considered, Gluster, CEPH and SeaweedFS before settling on Portworx.

            Portworx has a free developer license that allows use of 3 storage nodes. I use this to store volumes that are replicated across these nodes. As each node stores their own copy of the data, access to the data is very fast, you just need to ensure the service launches on a node which has the Portworx storage available.

            It kind of meets the need for HA and data replication; however, I still need to find a way of manually exporting the database to an external source as it won't protect from corruptions, deletions etc. My solution is really just like RAID across machines. Could we not add backup scheduling to the admin panel in talkyard; provide a destination and a schedule? Rsync might be a good start. I know containers are all about isolating tasks, so some might argue that a backup scheduler should be its own image instead though, which is fair.

            About secrets: I probably won't have time to look into that, the nearest time. Seems like a good thing to support, though, some time later when I'll re-visit Docker Swarm. (Or if you'd like to see if you can get it working and send a PR. However I think thad take lots of time and frustration.)

            Things at work are a little hectic at the moment and I don't have much docker dev experience, but if I get some time, I'll give it a go.

            1. KajMagnus @KajMagnus2019-04-02 04:41:12.080Zreplies toHuxy:

              Interesting to hear about the setup. I think the HA setup you mentioned, sounds good.

              Actually there's one issue: the Play Framework application, uses Serializable PostgreSQL isolation level transaction isolation, which causes transaction rollbacks somewhat often, if many threads or servers write to the PostgreSQL databae, at the same time. So, to be able to have many Play Fmw servers, some work needs to be done, to avoid different servers writing to the database (or at least the same parts of the same tables), at the same time. Maybe there could be a Master Play Fmw server that does all the writing, just like there's a Master PostgreSQL database.

              backups of data to another location

              That's an interesting topic I think. I'm thinking about implementing [every 10 minutes incremetal backup to an off-site server], by providing a "site map" of all uploaded files, + every 10 minutes, creating JSON files with [any textual contents that got added recently]. Then an off-site server with an API key, could use wget or httrack to mirror those uploads and JSON files.

              (And this could be combined with daily full PostgreSQL database dumps, + archieving all uploads into a .tar archive and copying off-site.)

              Also, about HA: I've been using Google Cloud Engine for a few years, and as far as I know, it's been approx 100% uptime. Google live-migrates one's VM to another physical machine, if the current one fails or needs maintenance. I'm wondering just using a GCE virtual machine, results in more HA, than anything one can build oneself. (With Amazon EC2 though, which I was using before GCE, I sometimes got messages that they were going to do maint work, and then the VM would shut down completely for a short while.)

              persistent storage solution; where the database data is replicated between multiple nodes. ... Portworx.

              Portworx sounds interesting, hadn't hear about before. Can I ask what made you settle for Portworx? (instead of, say, https://ceph.com/ceph-storage/block-storage/ )

              Could we not add backup scheduling to the admin panel in talkyard; provide a destination and a schedule? Rsync might be a good start

              That sounds like a good idea, to include in the admin panel. I've somewhat done this, however, via Cron and a Bash script. It's in the talkyard-prod-one installation steps; here're the instructions:

              https://github.com/debiki/talkyard-prod-one/blob/master/docs/copy-backups-elsewhere.md

              Actually, because of ransomware attacks, I think rsync should run on the off-site server. However, the backup frequency, could be in the admin panel, + instructions about how to create the off-site rsync script.

              backup scheduler should be its own image

              Yes, I have in mind to create a backup image as part of talkyard-prod-swarm, that creates daily / hourly full site backups, and saves in a .tar archive, which an off-site server can then rsync-download to itself.

              1. H
                @Huxy2019-04-15 08:19:09.428Zreplies toKajMagnus:

                That's an interesting topic I think. I'm thinking about implementing [every 10 minutes incremetal backup to an off-site server], by providing a "site map" of all uploaded files, + every 10 minutes, creating JSON files with [any textual contents that got added recently]. Then an off-site server with an API key, could use wget or httrack to mirror those uploads and JSON files.

                That sounds like a good idea, to include in the admin panel. I've somewhat done this, however, via Cron and a Bash script. It's in the talkyard-prod-one installation steps; here're the instructions:

                In hindsight, I'd probably use some sort of object store for my backups. So you could either save those files directly into a mounted volume (s3fs, nfs, cifs, local) or provide a backup container that supports sync'ing of the data to those types of destinations i.e. wget, httrack, s3fs, nfs client etc. For me an ideal solution would be providing the bucket with token and it automatically being uploaded with the minimum of fuss. If this could be managed from the UI as per your suggestion than all the better :)

                I'm wondering just using a GCE virtual machine, results in more HA, than anything one can build oneself. (With Amazon EC2 though, which I was using before GCE

                You are right of course, but it is not cheap to use GCE. I use VPS root servers and need the ability to scale up and down services due to things like maintenance. I would rather not perform an OS upgrade/reboot and have all my services go down. Anecdotally, in all my years of Dedicated Servers and VPS use, I have only had issues once (outside of scheduled maintenance) which caused downtime, so I agree that these cloud providers often provide good HA out of the box.

                Portworx sounds interesting, hadn't hear about before. Can I ask what made you settle for Portworx?

                Persistent storage solutions for docker swarm are far and few between. Portworx was the only storage orchestrator other than (StorageOS) that I got working with swarm but I chose Portworx because it can use Consul and I use it already for my cluster. Ceph would work, but I have no experience of using it, is and is more complex to setup then Portworx. It would also require third party plugins to provide support for docker volumes. On a side note I have concerns about the longevity of Swarm even though it's far easier to use than k8s. It seems like there were originally quite a few solutions for storage, but over the years focus on many solutions has been moved to kubernetes. Docker even bought up infinit which seemed promising, but now appears to be a dead project.

                1. H
                  @Huxy2019-04-15 08:51:34.532Zreplies toHuxy:

                  Forgot to mention, that on my journey's I also found a project called Stolon

                  It seems like an interesting solution to HA cloud postgres. It does, however, provide a lot of containers to handle the service availability. If you included a HA postgres deployment with every service stack, they'd soon stack up ;)

                  1. KajMagnus @KajMagnus2019-08-27 04:50:50.535Z2019-08-27 05:05:08.859Z

                    James @Huxy, now I've upgraded the docker-compose.yml file for Traefik + Swarm or Docker to the newest Talkyard version.

                    I don't know precisely how your config files look, but the change you need to do, to get the latest version (with user groups and invite-to-group), is this:
                    https://github.com/debiki/talkyard-versions/compare/3ff8bf3...d6b424c

                    That is, bumping the version number from v0.6.22-.... to v0.6.43-b2528e2.

                    (And fixing a typo: INTERNAL_NET_SUBNTET should have been INTERNAL_NET_SUBNET (...NET not ...NTET). But I think you use different config files and the typo fix won't affect you.)

                    And then, to upgrade, what I did in my installation that uses Traefik + Docker-Compose:

                    cd /opt/talkyard-prod-swarm   # but I use Compose not Swarm
                    docker-compose down
                    docker-compose up -d  # downloads new images, does database migration, starts everything
                    docker-compose logs -f --tail 999 app   # all looks fine?
                    
                    1. KajMagnus @KajMagnus2019-08-27 05:34:03.833Zreplies toHuxy:

                      On a side note I have concerns about the longevity of Swarm even though it's far easier to use than k8s

                      ( Me too, look at the commit activity: https://github.com/docker/swarm/graphs/contributors,
                      and compare with Docker: https://github.com/docker/docker-ce/graphs/contributors )

                      1. C
                        Christofer J Ekervhén @ChrisEke2021-01-20 11:01:28.179Zreplies toKajMagnus:

                        Hej @KajMagnus and all,
                        Thank you for this great piece of software!

                        I've successfully managed to deploy Talkyard on Kubernetes for my new personal blog. If anyone else would be interested I have setup repository github.com/ChrisEke/talkyard-k8s which include some basic instructions in the README.

                        Right now I've used a couple of rough shortcuts to override Postgres and Elastisearch log to file and instead point to stdout. Maybe we could look into including an env_var toggler for this instead?

                        Also naming for the components currently map 1:1 with the the docker_compose spec, e.g. app is called app, search=search etc. This is not optimal as it can become a bit unclear that these belong to talkyard in shared namespaces. Reason for this is that I had trouble overriding search default hostname in play-framework.conf, with for example:talkyard.elastisearch.host="talkyard-search". At that point I decided to just use the default hostnames for now.

                        I plan to write a bit more in-depth deployment guide when time allows.

                        1. KajMagnus @KajMagnus2021-01-20 22:47:09.078Zreplies toChrisEke:

                          Hi @ChrisEke , this looks interesting, & got me thinking :- )

                          One tricky thing, could be automatic upgrades — currently there're exact version numbers in the K8s config, e.g.:
                          https://github.com/ChrisEke/talkyard-k8s/blob/73c79282989a1dce6693ec650d0298da098734b9/manifests/app-Deployment.yaml#L7

                          kind: Deployment
                          metadata:
                            labels:
                              ...
                              app.kubernetes.io/version: v0.2021.01-923ae76d3
                          

                          I think most server root admins would not bump those numbers, manually, in a timely manner (e.g. if there's a security issue).

                          If you have a look in the Talkyard-Versions repo (https://github.com/debiki/talkyard-versions),
                          there's this Docker-Compose config template file:
                          https://github.com/debiki/talkyard-versions/blob/master/.docker-compose.template.yml
                          with ${TALKYARD_VERSION_TAG} instead of specific version numbers
                          — and there's a script, https://github.com/debiki/talkyard-versions/blob/master/.docker-compose.template.envsubst.sh,
                          which replaces the ${TALKYARD_VERSION_TAG} with the latest version.

                          Snippet:

                          services:
                            web:
                              image: ${DOCKER_REPOSITORY}/talkyard-web:${TALKYARD_VERSION_TAG}
                            ...
                            app:
                              image: ${DOCKER_REPOSITORY}/talkyard-app:${TALKYARD_VERSION_TAG}
                          

                          The idea is that, when releasing a new version, the Ty mainainer(s) also run this envsubt script,
                          to generates a new docker-compose.yml with the most recent image version numbers.
                          Then, self hosted Talkyard servers could run a Cron job that regularly fetches any new docker-compose.yml file,
                          and deploys the most recent images.

                          Don't know if something similar can be done also with K8s.
                          A Cron job that pulls from the repo, possibly gets new manifest files, with new version numbers?
                          And then redeploys to K8s? Hmm.

                          1. KajMagnus @KajMagnus2021-01-20 22:49:34.201Z2021-01-20 22:59:26.446Zreplies toChrisEke:

                            can become a bit unclear that these belong to talkyard in shared namespaces

                            Hmm ok it'd be nice if the container names could be configurable. Right now they're hardcoded here and there, e.g. in Nginx:

                            proxy_pass http://app:9000/-/websocket;
                            

                            Maybe better default names, could be tyapp and tysearch etc instead of app and search.

                            ***

                            Right now I've used a couple of rough shortcuts to override Postgres and Elastisearch log to file and instead point to stdout. Maybe we could look into including an env_var toggler for this instead?

                            Postgres:

                            I didn't know (until now) that Postgres supports a config file include 'other-file.conf' directive.
                            Maybe Ty's own built-in Postgres config could include a file that by default was empty, but which one could edit,
                            and override any default Ty Postgres settings, e.g. logging.

                            https://www.postgresql.org/docs/current/config-setting.html

                            the postgresql.conf file can contain include directives, which specify another file to read and process as if it were inserted into the configuration file at this point.

                            E.g. override any of these log config settings:
                            https://www.postgresql.org/docs/current/runtime-config-logging.html

                            And ElasticSearch too in some way, yes that'd be good.

                            An env_var toggler — coud that be something like LOG_DEST=/path/to/file or =/dev/stdout or maybe just LOG_TO_STDOUT=1 instead?

                            1. C
                              Christofer J Ekervhén @ChrisEke2021-01-21 09:28:45.798Zreplies toKajMagnus:

                              Thanks for taking time to look at what I've come up with so far. Much appreciated!

                              One tricky thing, could be automatic upgrades — currently there're exact version numbers in the K8s config, e.g.:
                              https://github.com/ChrisEke/talkyard-k8s/blob/73c79282989a1dce6693ec650d0298da098734b9/manifests/app-Deployment.yaml#L7

                              You are right, automatic upgrades in k8s are tricky and there's no real native solution for this. I'm a bit a skeptic myself towards automatic updates and prefer to control exactly the version running in production while vetting new versions in a staging environment. Typical deployment flow for k8s is that version updates are somehow included in CI/CD pipelines for the specific k8s-cluster. In other words the responsibility to perform timely updates are on the end user. If that is too much of an ask I would say your SaaS offering of Talkyard should be recommended instead :)

                              The TY version is not really hard-coded -- although it might appear so in the the ./manifests directory and included manifests . These manifests are generated from running ./build.sh that fetches latest TY version from the version repo (WIP excluded):

                              ...
                              talkyard_version_url="https://raw.githubusercontent.com/debiki/talkyard-versions/master/version-tags.log"
                              talkyard_version_output="lib/talkyard/talkyard-version.libsonnet"
                              ...
                              # Get latest Talkyard version
                              latest_version=$(curl -S --silent --fail $talkyard_version_url | grep -v -e "WIP\|^$" | tail -1; test ${PIPESTATUS[0]} -eq 0)
                              echo "{ _version+:: '"$latest_version"' }" > $talkyard_version_output
                              

                              The idea is when there's a new version released in upstream: new manifests are generated and new release tagged in talkyard-k8s repo. Right now I have not automatized this - but should be easily done with e.g. an action that scrapes URL or atom/rss-fee.

                              Upgrading is then as simple as running jb update && tk apply environments/default for Tanka or with Kustomize: kustomize build . | kubctl apply -f - if tracking main-branch. This can then somehow be incorporated in the end users deployment process , automatic or not. Same should be doable for another popular packaging/templating tool for k8s called Helm.

                              Maybe better default names, could be tyapp and tysearch etc instead of app and search.

                              Yes! That would be a better identifier.

                              Regarding logging either way should work fine. E.g. In the container image use minimal default setting and use include_file directives to read additional configuration from a volume or ENV during container run. I think the easiest approach is to just use default logging settings each vendor application and just facilitate to easy override these dependent on what type of deployment is performed.

                              But for TY app itself a LOG_TO_STDOUT=1 would be great! :)

                              1. KajMagnus @KajMagnus2021-01-29 02:40:14.048Z2021-01-29 03:38:01.571Zreplies toChrisEke:

                                I'm a bit a skeptic myself towards automatic updates and prefer to control exactly the version running in production while vetting new versions in a staging environment

                                Ok, seems like a good idea. Since 5 days ago, a new version is available: v0.2021.02.

                                What could be a way for you and others to get to know about new versions?

                                Maybe I could announce new versions in the Announcements category.
                                And if you want to, you could subscribe to that category?

                                ***

                                B.t.w. I'm adding release channels to Talkyard, with names inspired by Kubernetes:
                                Regular, Rapid and Stable channels. Actually, only Regular, for the nearest months (years?).

                                There could also be a Regular-Next channel, where the upcoming Regular version would appear, before it became the default Regular version. So one can try out the upcoming version, in a test/staging environment, before it becomes officially released. (Then one can report bugs to Ty, or maybe fix any integration incompatibilities, before they happen for real.)

                                ***

                                The TY version is not really hard-coded -- although it might appear so in the the ./manifests directory and included manifests . These manifests are generated from running ./build.sh that fetches latest TY version from the version repo

                                That sounds good :- )   just that I didn't find that code. Thanks for the code snippet.

                                This: talkyard_version_url="https://raw.githubusercontent.com/debiki/talkyard-versions/master/version-tags.log"
                                would, with the Regular release channel, be:

                                talkyard_version_url="https://raw.githubusercontent.com/debiki/talkyard-versions/tyse-v0-regular/version-tags.log"
                                

                                But master works too, for backwards compatibility. (The master branch == the tyse-v0-regular branch == the regular release channel.)

                                I have not automatized this - but should be easily done with e.g. an action that scrapes URL or atom/rss-fee

                                Ok

                                ***

                                Renaming to tyapp, tysearch etc — I think that'll have to wait, I need to think about if that could cause problems with existing intsallations.

                                I'll try to add LOG_TO_STDOUT=1 for Nginx at least, in the upcoming version. (And the others containers, later. Hmm with "TY app" did you mean the "app" container, or all containers :- ) Indeed it'd been nice it was named tyapp instead, not ambiguous)

                                Edit: "... rough shortcuts to override Postgres and Elastisearch log to file and instead point to stdout" — oh it's primarily those two that are annoying to reconfigure? (so they log to stdout)

                                Hmm, currently the app container logs to both stdout and to /var/log/talkyard/ — seems unnecessary to write to /var/log/ when using K8s. (And unnecessary to write to stdout when using Docker-Compose.)

                                1. C
                                  Christofer J Ekervhén @ChrisEke2021-01-29 15:35:56.941Zreplies toKajMagnus:

                                  Hmm thought I had my manual version tracking covered for the moment with watching the repo via GitHub. But the last update to my feed was c12e790 Add v0.2021.02-WIP-879ef3fe1. 14 days ago.

                                  Thanks for letting me know of the new version - I've upped the version and created a new release in talkyard-k8s repo. If it is easy for you to implement new versions in Announcements category I say go for it and I will subscribe!

                                  Regarding different release channels: that sounds like an excellent idea and I will try to include this in k8s repo as well.

                                  Hmm with "TY app" did you mean the "app" container, or all containers :- )

                                  Hehe yes the app container :)

                                  Edit: "... rough shortcuts to override Postgres and Elastisearch log to file and instead point to stdout" — oh it's primarily those two that are annoying to reconfigure? (so they log to stdout)

                                  Yes, exactly. The other containers seems to log to stdout already.

                                  1. C
                                    Christofer J Ekervhén @ChrisEke2021-02-08 22:56:23.346Zreplies toChrisEke:

                                    I've now written a more detailed guide on how Talkyard can be deployed with k8s for anyone who might be interested:
                                    https://www.ekervhen.xyz/posts/2021-02/talkyard-on-k8s/

                                    1. KajMagnus @KajMagnus2021-02-10 11:00:40.600Zreplies toChrisEke:

                                      Cool, I'll check it out, (& I'll have a Ty + K8s related question :- ) which feels a little bit more on-topic over at your blog)

                                      Logging & stdout: in the upcoming version, there's a TY_LOG_TO_STDOUT_STDERR conf val — currently only Nginx looks at it, but later, other containers will too.

                                      Postgres: Turns out you can make Postgres log to stdout by changing the container run command to: command: '--logging_collector=off'
                                      (which looks a bit odd — apparently the image's entrypoint starts Postgres and appends the command (instead of running it)).

                                      ElasticSearch: Once Ty has upgraded from ES6.x to ES 7.x, then ES will log to stdout
                                      (apparently ES 7.x always logs to stdout, when running in a Docker container, see:
                                      https://www.elastic.co/guide/en/elasticsearch/reference/current/logging.html )

                                      If you run Elasticsearch as a service, the default location of the logs varies based on your platform and installation method:
                                      [...] On Docker, log messages go to the console and are handled by the configured Docker logging driver. To access logs, run docker logs.

                                      1. KajMagnus @KajMagnus2021-02-10 11:06:29.520Zreplies toKajMagnus:

                                        I wrote:

                                        currently only Nginx looks at it, but later, other containers will too

                                        @ChrisEke I suspect that in your K8s system, Nginx actually didn't log to stdout, and you didn't see any Nginx messages? But maybe you didn't think about that, because it's not so obvious what the default log level is? I suspect that your Nginx container writes to /var/log/nginx/ currently.

                                        B.t.w. K8s is fine with log messages to any of stdout or stderr, right (?). So Nginx access log can be stdout, and Nginx error log would be stderr. (However when trying this in Docker-Compose, the stderr log messages just disappeared. So I pointed Ngix error log to stdout actually, if TY_LOG_TO_STDOUT_STDERR.)

                                        1. C
                                          Christofer J Ekervhén @ChrisEke2021-02-10 23:30:01.667Zreplies toKajMagnus:

                                          Postgres: Turns out you can make Postgres log to stdout by changing the container run command to: command: '--logging_collector=off'

                                          Thanks for looking this up! I've tested adding this as an arg for postgres and it works. Makes my override script redundant so I have removed it.
                                          Sounds good with ES 7 using the default logging drivers - won't be any issue getting the logs on k8s then.

                                          Regarding ty-web nginx deployment: I do get log messages, both to stdout & stderr. I had a peak in the running container in /var/log/nginx and it's configured:

                                          access.log -> /dev/stdout
                                          error.log -> /dev/stderr
                                          

                                          I don't know why though :)