No internet connection
  1. Home
  2. Ideas

Talkyard API: Upserting categories

By KajMagnus @KajMagnus2019-06-14 10:38:22.764Z2019-06-14 10:48:35.808Z

So there's going to be an endpoint for upserting categories. This is what I have in mind: (feedback welcome :- ))

POST server-address/-/v0/upsert-simple

and that endpoint accepts a SimpleUpsertV0 object that looks like this, in Typescript: (and ? means optional )

interface SimpleUpsertV0 {
  categories?: SimpleCategoryV0[];

  // ... more things, later, if you want to upsert many things
  // in the same database transaction.
}


interface SimpleCategoryV0 {
  id: CategoryId;      // should be  2 000 000 001
  extImpId?: string;   // your own external unique id for the category
  parentId?: CategoryId;         // later, skip for now
  defaultSubCatId?: CategoryId;  // later, skip for now
  name: string;
  slug: string;
  description: string;
  position?: number;
  defaultTopicType?: PageType;
};

The extImpId is your own unique id for the category. When Talkyard (Ty) upserts the category, Ty first tries to look up any existing category with this "external import id", and, if found, Ty updates that category. If not found, Ty creates a new category, assigns it an id (less than 2e9) and remembers its extImpId.

The id field should be any number > 2e9, e.g. 2 000 000 001. Numbers > 2e9 are reserved for things you upsert into Ty and that you don't know if they're present in Ty's database (and you don't want to write extra code to find out).

You can set id to any 32 bit signed integer > 2e9. If you upsert more things, like pages and sub categories, in a single HTTP request — then, some of the other items you upsert, can refer to the category.id and then Talkyard knows you want to place them in that category.

(B.t.w. I'm including the word "simple" in the URL and Typescript interface, because under the hood, more things happen, that you don't need to know about: When you create a category, then, also, a category description page gets created, and a Post that stores the category description text and edit revisions, if you edit the description later. — Probably there'll also be a "complicated" endpoint /-/v0/upsert-dump that specifies exactly what things should be upserted and doesn't do anything for you automatically / implicitly)

@chrscheuer what do you think? Something that can be done in a better way, or maybe re-thinking the whole approach?

  • 21 replies

There are 21 replies. Estimated reading time: 32 minutes

  1. KajMagnus @KajMagnus2019-07-01 09:23:19.323Z

    @chrscheuer

    I think you'll need to be able to assign an "External Import IDs" of your choosing, to the parent Packages category, so you have an id to refer to, when upserting child categories for individual packages.

    Another approach would be that you refer to the Packages category, via Talkyard's internal category id, but not impossible that exposing such internal ids will cause problems in the future, if I need to change the ids somehow, or if they get remapped to different numbers during an export and import.

    I'm thinking it'd be more future proof, if you instead give the Packages category an extImpId, say, "soundflow_packages", to refer to when upserting child categories.

    To upsert a sub category, you'd then do this:

    POST server-address/-/v0/upsert-simple
    
    {
      categories: [{
        // This is the Packages category. It's in the database already; this entry is here only so you'll
        // have a "temporary import id" to refer to ---.
        extImpId: 'soundflow_packages',                |
        id: 2000000001,                   <------------+
      } {                                              |
        extImpId: 'someones_new_package_name',         |
        id: 2000000002,                                |
        parentId: 2000000001,          <---------------`
        name: "New Package Name",
        slug: "new_package_slug',
        description ....
      }]
    }
    

    Talkyard will then lookup extImpId = 'soundflow_packages' in the database to find the real internal id X of the Packages category, and then, when saving the New Package, Talkyard sets its parentId not to 2000000001 but to X.

    Another alternative could be:

    POST server-address/-/v0/upsert-simple
    
    {
      categories: [{
        extImpId: 'someones_new_package_name',
        id: 2000000001,
        parentExtImpId: 'soundflow_packages,    <———
        name: "New Package Name",
        slug: "new_package_slug',
        description ....
      }]
    }
    

    What do you think?

    1. CChristian Scheuer @chrscheuer2019-07-01 09:26:12.295Z

      This sounds great!

      1. CChristian Scheuer @chrscheuer2019-07-01 09:28:15.826Z

        As a general scheme, I don't know if you need to distinguish between "pure-reference entities" and "entities for upserting". Right now you seem to make the distinction based on the name/slug/description parameters being present, but imagine that all those parameters were optional (for something other than categories)..
        Idk. It's a hypothetical and not a problem for now. Problem / not a problem?

        1. CChristian Scheuer @chrscheuer2019-07-01 09:30:35.943Z

          A different way would be to have id and parentId be strings, that could be of the form:

          parentId: "talkyard:53"
          parentId: "ext:soundflow_packages"
          

          By using a scheme like this instead of magic numbers it's more manageable for any future additions.

          1. CChristian Scheuer @chrscheuer2019-07-01 09:32:08.046Z

            This doesn't change anything about the fact that the underlying id's are numerical. It's just making a more concise interface. Note this also covers the fact where you need to upsert more things in the same batch and have them relate to each other, since you'd be using the "ext:xxx" form then.
            It could also support pure numerical input, in which case it would assume it was the talkyard internal id.

            1. CChristian Scheuer @chrscheuer2019-07-01 09:35:43.487Z

              Sorry for spamming. But the upside of having a unified simple string-based format for referencing entities is this can help in other ways too, for example in quick-access URLs it would be easier than having to present both entities.
              Also, the magic numbers system has the drawback that the user of the API needs to build a separate map for magic-numbers to ext-import-ids, that they don't have to with the string based format.
              It's inspired by the API auth token btw.

              1. KajMagnus @KajMagnus2019-07-11 06:06:12.468Z2019-07-11 06:28:24.064Z

                distinguish between "pure-reference entities" and "entities for upserting". Right now you seem to make the distinction based on the name/slug/description parameters being present, but imagine that all those parameters were optional (for something other than categories)

                That's a good point. For now, maybe we can do like this:

                {  // a category to upsert
                   extId: 'someones_new_package_name',
                   parentExtId: 'soundflow_packages,    <———
                   name: "New Package Name",
                   slug: "new_package_slug',
                   description ....
                }
                

                and then we won't need to think about that possible problem, for now.

                A different way would be to have id and parentId be strings

                Hmm. Sometimes both an internal Talkyard id, and an external id, will get imported at the same time. So I think there needs to be two separate fields, for the internal and external ids? This would happen if you export your Talkyard site from talkyard.net to a dump file, and then import to your own server. Then you'd want to reuse the same page and post internal ids (page ids typically appear in the urls) — and also import the same external ids. So, two fields, for each item that gets imported. And one would be the numeric internal id, and the other would be an external id string (no "ext:" prefix needed).

                unified simple string-based format for referencing entities is this can help in other ways too, for example in quick-access URLs

                That sounds as if could be nice. You have in mind that it'd get used e.g. like this?:

                /-/v0/list-topics-in-category?categoryId=ext:external_category_id
                /-/v0/list-topics-by-user?userId=extSsoId:user_external_single_sign_on_id
                /-/v0/list-topics-by-user?userId=extId:user_external_id
                

                Or this would work equally well? and might be slightly simpler to implement, hmm:

                /-/v0/list-topics-in-category?categoryExtId=external_cat_id
                /-/v0/list-topics-by-user?userExtSsoId=user_external_single_sign_on_id
                /-/v0/list-topics-by-user?userExtId=user_external_id
                

                the magic numbers system has the drawback that the user of the API needs to build a separate map for magic-numbers to ext-import-ids, that they don't have to with the string based format

                That's a good point. I just did that, for a Disqus comments importer, and indeed this was a bit extra work. It'd be nice to re-think this later and make the dump files work with purely external ids. — For now, the quickest and simplest way to get something working, was actually to use these magic 2e9 + 1 numbers. This is partly related to me using Scala server side, statically typed, and creating new classes with string fields, or changing the ids from int to string, seemed like lots of work.

                ... Some time later I think I'll need to create "Patch" classes that are effectively copies of [the other classes representing things in the database], but with all fields optional. And with string extId:s and parentExtIds etc. Then, by specifying only some of these optional fields, and upserting something, one can choose exactly what to change, in the database. E.g. upserting a CategoryPatch, and specifying an extid and a new category slug, but no other fields —> the slug changes, but description and name and everything else left intact.

                1. CChristian Scheuer @chrscheuer2019-07-11 08:13:55.270Z

                  Yea this is good progress on the issue. I'm a bit tired when reading this so not entirely sure I get everything 100%. Will re-read later. First thoughts are:

                  That's a good point. For now, maybe we can do like this:

                  { // a category to upsert
                  extId: 'someones_new_package_name',
                  parentExtId: 'soundflow_packages, <———
                  name: "New Package Name",
                  slug: "new_package_slug',
                  description ....
                  }

                  Yes this is an improvement to not have to deal with the extra pure-reference entity and magic number mappings.

                  However the "parentExtId" field doesn't solve the case where you want to reference a parent that has an existing internal talkyard ID. This would only work for referencing parents that were associated with external IDs. Maybe you want to enforce that entities that need mapping always need external IDs? Or maybe I would just need to put a different field in there if I was trying to refer to it via the parent's TY id...

                  Hmm. Sometimes both an internal Talkyard id, and an external id, will get imported at the same time. So I think there needs to be two separate fields, for the internal and external ids?

                  I agree for primary keys yes. I was only referring to using the string based approach when referencing another entity. Not for any entity's own primary keys.
                  So an entity that you send would be for example

                  {
                      categoryId: 235238458,
                      parentId: 239582 | 'ty:239582' | 'ext:lakejrha:Aerlhkjaer:ekljfdlh',
                      ....
                  }
                  

                  My point being here, you don't want conflicting references to parents. You just want 1 reference. The parent itself will have both a TY ID and a EXT ID (potentially). The reference will be to either of those two.
                  If you allow both a parentExtId and a parentId field they could be in disagreement and that means we then need to figure out who is more in charge.

                  In the roundtrip case we'd have a pure number / TY-ID reference since that's what we received in the dump. If we want to get the parent's EXT ID in the dumped version we could look it up by resolving the parentId ourselves.

                  No matter what, the problem we're trying to solve is purely how you want to reference other entities within something that we send TO the API. I think the API should always return TY's own IDs..
                  Hmm.. Or, we think of the ID systems as 2 entirely separate systems, meaning you'd always return both a parentId and a parentExtId, just for making it easier for the API consumer... I'm just not sure if this makes it easier or if it introduces "who's in charge" problems as described above...
                  At this point maybe we're overthinking this a tiny bit. But it's an interesting discussion :)

                  That sounds as if could be nice. You have in mind that it'd get used e.g. like this?:
                  ...
                  Or this would work equally well? and might be slightly simpler to implement, hmm:
                  ...

                  Gotcha. Yea those would both work of course. I'll leave it up to you as to what makes best sense from a data perspective. Implementation wise it's just a string split so I wouldn't let that be the determining factor. I would look at if it makes sense to have a unified way across the whole system to do ID references of entities #1, or if you want to define external ID associations more manually (distinct property naming per entity) as in #2. If I were designing a schema that was to be automating syncing between 2 systems I would like this to fairly standardised so you don't end up with slightly differing field names etc (can grow ugly over time).

                  1. KajMagnus @KajMagnus2019-07-13 16:56:03.895Z

                    However the "parentExtId" field doesn't solve the case where you want to reference a parent that has an existing internal talkyard ID. This would only work for referencing parents that were associated with external IDs. Maybe you want to enforce that entities that need mapping always need external IDs? Or maybe I would just need to put a different field in there if I was trying to refer to it via the parent's TY id

                    I had in mind that one could choose between using 1) parentExtId: "<external_id>" or 2) parentId: <talkyard_internal_int_id>. And if one specified both, that'd work fine, as long as both ids pointed to the same parent — otherwise, the server would reply Error.

                    I was only referring to using the string based approach when referencing another entity. Not for any entity's own primary keys.

                    Ok

                    don't want conflicting references to parents. You just want 1 reference

                    I agree; just 1 ref is simpler

                    Now I'm thinking there can instead be a parentRef (note: Ref ) field, to reference the parent. It can be a string, e.g. "extid:soundflow_packages" or "tyid:1234". And in a url:

                    https://server/-/v0/list-categories?inParentCategory=extid:soundflow_packages
                    

                    (and, like you wrote above, if there's no type: prefix, then the value is interpreted as a Takyard internal id.)

                    I'm thinking using another name for these reference values, makes it easier to talk about. When one says "partenId" one always means Talkyard's internal ids. And "parentRef" or "whateverRef" menas these "extensible" text values with a type: prefix.

                    (And one typically wouldn't include parent ids, when upserting something. Only a ref to the parent. And 1) if one includes an id anyway, it needs to point to the same thing, as the parent ref does. And 2) when one exports a Talkyard site to a dump file, only the parentIds would be included in the exported json.)

                    1. CChristian Scheuer @chrscheuer2019-07-14 16:38:45.799Z

                      parentRef is a great idea, to distinguish clearly between ids and references.

                      Now I'm thinking there can instead be a parentRef (note: Ref ) field, to reference the parent. It can be a string, e.g. "extid:soundflow_packages" or "tyid:1234". And in a url:
                      https://server/-/v0/list-categories?inParentCategory=extid:soundflow_packages
                      (and, like you wrote above, if there's no type: prefix, then the value is interpreted as a Takyard internal id.)

                      Yes I completely agree.

                      1. CChristian Scheuer @chrscheuer2019-07-14 18:29:33.902Z

                        The distinction between refs and IDs also means there's a clear distinction between API input and output. Input would use refs for refs to parents, groups etc, whereas API output would be IDs (ie. pure talkyard integers) - for example if you're listing categories, posts etc.
                        This makes sense to me.

    2. In reply toKajMagnus:
      KajMagnus @KajMagnus2019-07-14 02:15:14.024Z

      @chrscheuer What about permissions? The upserted Soundflow package categories, are they to be publicly visible, or visible only to community members?

      Maybe the JSON can be like: (note the permissions: array at the end, and the ...Ref: 'extid:...' references)

      POST server-address/-/v0/upsert-simple
      
      {
      categories: [{
         extId: 'a_package_name',
         parentRef: 'extid:soundflow_packages',
         name: 'Package Name',
         slug: 'package_slug',
         description: '...',
         permissions: [{
           forMemberRef: 'extid:all_members',
           maySee: true,
           mayCreatePages: true,
           mayPostReplies: true,
           mayEditOwn: true,
         }, {
           forMemberRef: 'extid:staff',   // admins and moderators
           maySee: true,
           mayCreatePages: true,
           mayPostReplies: true,
           mayEditAll: true,     // for staff only
           mayDeleteAll: true,   // for staff only
         }]
      }]
      }
      1. CChristian Scheuer @chrscheuer2019-07-14 16:43:23.681Z

        This looks great.

        Our auto-created packages should be visible to anybody for now, but I agree it makes sense to touch on the permissions in the upsert API schema. Maybe it can default to "normal public category" if no permissions are specified, just for ease of use?

        We're currently using "Trusted member" to give our alpha/beta team access to restricted categories that are only meant for them. Ideally whichever solution is made for the permissions schema it will be forwards compatible with if you will make access groups a real feature with more granular control some time in the future. I'm thinking the forMemberRef is referring to groups, so this sounds like you already have it covered.

        1. CChristian Scheuer @chrscheuer2019-07-14 16:45:43.980Z

          Oh wow. Now that I took a look, I see there's now a Custom Groups section!! That's epic!! When did this come? :)
          You should do a newsletter or something where we can follow the progress haha so we don't miss new awesome features

          1. CChristian Scheuer @chrscheuer2019-07-14 18:13:58.575Z

            One thing to note. If we go with the "extid:blabla" and "tyid:blabla" string ref thing. Make sure that it parses stuff like "extid:blabla:blabla:blabla" correctly, that is when the external ID itself has colons (since ours do). This essentially makes it a URN with a custom scheme name.

            1. CChristian Scheuer @chrscheuer2019-07-14 18:45:13.058Z2019-07-14 19:18:52.841Z

              Another thing I'm thinking could be good for the API implementation, maybe not in version 1, but down the road, is to address the very very usual case where all we need is just a link to an existing category since it was already created. 99.9% of requests, even a higher percentage down the road, are going to be read-only requests, not needing to upsert anything.
              It may be relevant to have a very speedy API that just gives back a link from an in-memory cache / redis based on the provided extId. This API would then fail in the case where the category hasn't been created.
              Of course we can build this caching layer ourselves too, so it's not a high priority. Just something worth considering at one point. Maybe it's even better if we build this cache ourselves actually, since we'll know when the original data (name etc) changed, which could change the URL, so we'd render our cache invalid in those cases.

      2. C
        In reply toKajMagnus:
        Christian Scheuer @chrscheuer2019-06-14 12:44:34.620Z

        Wohoo! What a great Friday gift :)

        @chrscheuer what do you think? Something that can be done in a better way, or maybe re-thinking the whole approach?

        This definitely seems like a very good approach very close to what I was hoping for.
        I completely understand what you mean by "simple" and I agree with that concept. To me making the API surface smaller is only gonna help creating tests for it and ensuring that it works across all cases.

        Couple of things we should consider:

        • What does the endpoint return? Some would probably like the id, however we'd probably just want the url (value taken from the generated page's pagePath.. if I remember correctly). Maybe we need a SimpleCategoryUpsertResponseV0 interface that either responds with simple things like an url, and/or with all the generated entries from the various tables.

        • URLs matter. Right now all posts/threads/pages have id + slug based urls, where the slug can change without affecting the URL since the id stays the same. It appears categories don't have this (at least right now). In the SimpleCategoryV0 interface the slug is a required parameter. This leads me to think that by changing the slug we can (accidentally or because we want to) change the url of the category. We need to think about how this behavior should be. Our users can change the name of a package but it stays with the same internal ID (and thus internal url). We'd want the category url to stay the same too, across different slugs.
          Maybe you already have support for this, but it would make sense to me that these auto-generated categories would utilize the same url pattern as posts. Maybe a flag to set in the interface?

        • Now for the interesting part. In the future you'll want to support sub-categories. We'll need to think about how those urls should look. To the user, it would make sense that each subcategory appends itself to the url of its parent. However that doesn't seem consistent with the -uniqueid/slug pattern that I requested above. So this is something that needs to be considered.

        • Slug algorithm/restrictions. Which characters are allowed in the slug? Which length?

        • id magic numbers > 2e9. This sounds great and well thought out for more complex insertions.

        • Edit: added this: Are there any limitations for extId strings length wise? Our id's for packages are concatenated user id + package id + so ours are about 60 characters long...

        That's what I think for now. Thank you so much for looking into this :)

        1. CChristian Scheuer @chrscheuer2019-06-14 17:19:23.332Z

          Another way to handle category url changes would be to make sure to keep a "history" inside TY of previous urls, so that categories whose urls (slugs) change, would just have permanently occupied the old space that will then redirect to the new one.
          That would keep it possible to have easy-to-read nested category urls, like:
          http://forum.soundflow.org/latest/packages/my-package

          1. KajMagnus @KajMagnus2019-06-18 12:15:15.314Z
            • What does the endpoint return — yes seems like the category id and the slug should be included in the reply. Maybe a list of all things that got created, yes. Maybe you'd find the cateory and its id and slug like so responseJson.categories[0].slug and [0].id and .urlPath.

            • Category slugs and renaming categories: I like the approach with keeping a history, and having old (sub) category url paths rediect to the current up-to-date path, like /latest/packages/current-package-name. This is b.t.w. how discussion topic urls already work — if you change the url to a topic, the old url redirects to the new (should work also for pages for which the /-1234/ id number has been excluded in the url (there's an "advanced" setting for hiding the page id in the url))

              I'm thinking people renaming one package to [the previous name of another package], is something you'll need to deal with in Soundflow? (mabe by not allowing such renames?) And it's okay if, in Talkyard, if a category Bbb gets renamed to [a previous name of another category Aaa], then Ty stops the old url path to Aaa from redirecting to the new path to Aaa, and instead that url path will be to Bbb (although it was previously used by Aaa).

            • "any limitations for extId strings length wise?" — I have in mind an external-import-id 100 char length restriction (i.e. > 60).

            • "Which characters are allowed in the slug" — I have in mind lowercase alpahumeric + hyphen - ok inside (but not starting or ending with -) and at least one letter (not only digits).

            1. CChristian Scheuer @chrscheuer2019-06-22 17:21:54.918Z

              What does the endpoint return — yes seems like the category id and the slug should be included in the reply. Maybe a list of all things that got created, yes. Maybe you'd find the cateory and its id and slug like so responseJson.categories[0].slug and [0].id and .urlPath.

              Sounds good to me - so basically close to a copy of the simple input with urlPath added...
              It might make sense at one point to also have the complete raw entries present in the response, but for the current use cases that would just make the response larger and thus slow stuff down (slightly). Just thinking about how generic we can make the request/response object layout so that it's flexible enough for other use cases too.

              I'm thinking people renaming one package to [the previous name of another package], is something you'll need to deal with in Soundflow? (mabe by not allowing such renames?) And it's okay if, in Talkyard, if a category Bbb gets renamed to [a previous name of another category Aaa], then Ty stops the old url path to Aaa from redirecting to the new path to Aaa, and instead that url path will be to Bbb (although it was previously used by Aaa).

              Agree completely.
              This also confirms that the API caller (SoundFlow in this case) should be in charge of creating slugs since that puts the responsibility on us for detecting identical URLs and providing a UI for the user or an algorithm to make sure that it doesn't happen.
              The opposite approach would be for the slug algo to sit with Talkyard and then TY would just append numbers to slugs in case something already existed at the same address/path - but that would give the API caller less control.

              "any limitations for extId strings length wise?" — I have in mind an external-import-id 100 char length restriction (i.e. > 60).

              Sounds great!

              "Which characters are allowed in the slug" — I have in mind lowercase alpahumeric + hyphen - ok inside (but not starting or ending with -) and at least one letter (not only digits).

              Sounds great!

        2. Progress
          with doing this idea
        3. @KajMagnus marked this topic as Started 2019-06-23 00:09:08.286Z.