No internet connection
  1. Home
  2. Ideas

Bulk import from Disqus

By Jason @detly2018-04-22 11:33:46.705Z

I'm migrating over from Disqus to Talkyard, and I have a lot of old comments in Disqus. Fortunately Disqus allow you to export comments, so I'm wondering what the best way to import them into Talkyard would be.

Just to be clear on what I'm asking, I don't need a "does-everything-for-me" wizard that understands Disqus' export format. I'm happy to write some Python and parse the XML. It's the "getting it into Talkyard" step I'm not sure about. I'd be submitting a massive number of comments under other peoples' email addresses, so I'm thinking I might need to turn off the spam/flood/auth filters, unless there's a method for an admin to post on behalf of someone else?

But more generally, how would I programmatically populate these comments? Are there docs for this, or a particular source file I should look at? Disqus gives me:

  • Display name
  • Email
  • Date/time
  • IP address
  • Comment
  • Threading information
  • The post it was on (via another XML section)

Where should I start?

  • 17 replies

There are 17 replies. Estimated reading time: 20 minutes

  1. KajMagnus @KajMagnus2018-04-23 13:58:42.617Z

    would I programmatically populate these comments? Are there docs for this, or a particular source file I should look at?

    There is a HTTP endpoint to which one can POST a JSON file, with users, emails, topics, comments etcetera, in a Talkyard specific JSON structure. But right now it's for end-to-end tests only (it says 401 Forbidden for anything that isn't an end-to-end test).

    I could look into enabling it for "real" usage, and see what more things it maybe needs to do / support, to be able to import Disqus comments.

    And then, if you write a Python script that converst from Disqus XML to Talkyard's JSON format, you could send the JSON file to the Talkyard server (when you're logged in as admin), and all comments would get imported.

    (Here's the source: https://github.com/debiki/talkyard/blob/master/app/controllers/ImportExportController.scala )

    Here's how the JSON looks: (this JSON creates an end-to-end test site. It's an excerpt — I deleted things that's off-topic for Disqus comments)

    (this is just to give you and idea about roughly how it looks — probably you need more details, to be able to write the Python script. Also there're some fields below that you don't need to send to the server, it could fill them in itself. )

    {
      "members": [
        {
          "id": 101,
          "username": "owen_owner",
          "fullName": "Owen Owner",
          "createdAtMs": 1449198824000,
          "emailAddress": "e2e-test--owen-owner@example.com",
          "emailVerifiedAtMs": 1449198824000,
          "passwordHash": "cleartext:publicOwen123",
          "password": "publicOwen123",
          "isOwner": true,
          "isAdmin": true,
          "trustLevel": 2
        },
        {
          "id": 102,
          "username": "mod_mons",
          "fullName": "Mod Mons",
          "createdAtMs": 1449198824000,
          "emailAddress": "e2e-test--mod-mons@example.com",
          "emailVerifiedAtMs": 1449198824000,
          "passwordHash": "cleartext:publicMons123",
          "password": "publicMons123",
          "isModerator": true,
          "trustLevel": 2
        }
        ...
      ],
      "identities": [],
      "guests": [
        {
          "id": -10,
          "fullName": "Guest Gunnar",
          "createdAtMs": 1449198824000,
          "emailAddress": "e2e-test--guest-gunnar@example.com",
          "isGuest": true
        }
        ...
      ],
      "pages": [
        {
          "id": "byMariaCategoryA",
          "role": 12,
          "categoryId": 2,
          "authorId": 106,
          "createdAtMs": 1449198824000,
          "updatedAtMs": 1449198824000,
          "version": 1
        },
        {
          "id": "byMariaCategoryA_2",
          "role": 12,
          "categoryId": 2,
          "authorId": 106,
          "createdAtMs": 1449198824000,
          "updatedAtMs": 1449198824000,
          "version": 1
        }
        ...
      ],
      "pagePaths": [
        {
          "folder": "/",
          "pageId": "byMariaCategoryA",
          "showId": false,
          "slug": "by-maria-category-a"
        },
        {
          "folder": "/",
          "pageId": "byMariaCategoryA_2",
          "showId": false,
          "slug": "by-maria-category-a-2"
        }
        ...
      ],
      "posts": [
        {
          "id": 114,
          "pageId": "byMariaCategoryA",
          "nr": 1,
          "createdAtMs": 1449198824000,
          "createdById": 106,
          "currRevStartedAtMs": 1449198824000,
          "currRevById": 106,
          "numDistinctEditors": 1,
          "approvedSource": "By Maria in CategoryA, text text text.",
          "approvedHtmlSanitized": "<p>By Maria in CategoryA, text text text.</p>",
          "approvedAtMs": 1449198824000,
          "approvedById": 1,
          "approvedRevNr": 1,
          "currRevNr": 1
        },
        {
          "id": 115,
          "pageId": "byMariaCategoryA_2",
          "nr": 0,
          "createdAtMs": 1449198824000,
          "createdById": 106,
          "currRevStartedAtMs": 1449198824000,
          "currRevById": 106,
          "numDistinctEditors": 1,
          "approvedSource": "By Maria in CategoryA nr 2 title",
          "approvedHtmlSanitized": "By Maria in CategoryA nr 2 title",
          "approvedAtMs": 1449198824000,
          "approvedById": 1,
          "approvedRevNr": 1,
          "currRevNr": 1
        }
        ...
      ]
    }
    
    1. DJason @detly2018-04-28 10:42:04.690Z

      Thanks for this! I'll work on a script to create the JSON, and maybe by the time I've finished either you'll have an endpoint for it to be posted to or I'll have learnt Scala.

      A few questions:

      • Just overall, which source file should I dig into to understand the structure of this?
      • I notice that the guest ID is -10. Are all guest IDs negative?
      • How does threading work? Can I link a post to a parent post?
      • What's nr in the post data?
      • Can I skip the approvedSource since I already have my sanitised HTML via Disqus?
      1. KajMagnus @KajMagnus2018-04-30 10:30:19.173Z

        Ok :- )

        which source file should I dig into

        The end-to-end test files I would suggest. They create the JSON structure a Disqus importer also would need to create. Look here:

        • A Typescript definition of the JSON structure, interface SiteData in tests/e2e/test-types.ts

        • A function that constructs a discussion topic and adds to that JSON structure: addPage, here in tests/e2e/utils/site-builder.ts.
          The field role: PageRole should be set to PageRole.EmbeddedComments = 5 (an enum) for embedded comments topics.
          (here's that enum: client/app/model.ts )

        • How to create user JSON objects: functions like memberMaria and guestGunnar, in tests/e2e/utils/make.ts

        • Adding users to the JSON obj, in site-builder.ts
          e.g. site.members.push(forum.members.mallory);

          You could either 1) import the Disqus users into guests accounts (they don't need any password or username), or 2) into "real" accounts, i.e. with password and username. I suppose you'd then generate random passwords, and if someone who has commented on your blog previously, would want to continue using the same account, s/he would click "Forgot password", and get a password reset email.

        Are all guest IDs negative?

        Yes, <= -10 are for guests, and >= 100 are for members with real accounts. There are some magic ids too, from -9 up to +9, like +1 for the System user. And (in case you're curious) default built-in groups (Everyone, New Members, ... Regular Members, Core Members) have ids 10, 11, 12, ...).

        How does threading work? Can I link a post to a parent post?
        What's nr in the post data?

        One links to the parent post, via the field parentNr. Each post has a field nr which is the order in which that post was added to the discussion.

        The page title has nr = 0, page body (a.k.a. the Original Post, for forum topics) has nr 1. The first comment has nr 2, and parentNr = 1. The 2nd comment has nr = 3, and parentNr is 1 or 2, depending on if it replies to the blog post = nr 1, or to the first comment = nr 2. And so on.

        Embedded discussion pages have auto generated titles like "Comments for <the blog post url>)".

        There's also an id field, which uniquely identifies a comment in the database. nr is unique within a certain discussion only. If an admin moves a comment from one discussion to another, it'll get a new nr, but keep the same id.

        Note to myself: I'll probably need to make the importer work, without any id fields. It's not really possible for you to know which ids to use, since there are some ids in the database already (and those should be avoided).

        Can I skip the approvedSource since I already have my sanitised HTML via Disqus?

        It's used for editing: If someone decides to edit a post (e.g. you — admins can edit other's posts), the editor will display the source for that comment (which is the approvedSource field in the JSON to import).

        You can set approvedSource to the HTML exported from Disqus — that is, set both approvedSource and approvedHtmlSanitized to the post's HTML. Then, if someone wants to edit a comment imported from Disqus, s/he'll see & can edit the HTML from Disqus.


        I hope this helps :- ) & I've a little bit started looking at what I need to do server side.

        1. KajMagnus @KajMagnus2018-04-30 15:51:13.933Z

          Mentioning @detly. So you'll get a notification email and see my comment above.

          (About a week ago I changed the email notification sent-from address, but forgot to verify the new sent-from address, so no emails got sent :- P )

    2. Progress
      with doing this idea
    3. Steve Mitchell @SteveM2019-04-19 23:15:48.477Z

      Did anyone get code or bits and pieces working to enable this? I have a site I'd love to migrate to Talkyard, but it has lots of comments in Disqus that I can't lose, and would need to import...

      1. KajMagnus @KajMagnus2019-04-20 05:37:09.420Zreplies toSteveM:

        Hi @SteveM, right now there's no import-from-Disqus (that I know about). I started writing an importer, then postponed that. Recently I did a bit more related work ... Maybe in two months, there'll be an importer.

        1. @KajMagnus marked this topic as Started 2019-06-11 15:12:24.104Z.
        2. KajMagnus @KajMagnus2019-06-11 15:12:11.895Z2019-06-11 15:20:06.195Z

          @SteveM (and Jason) — Now I've resumed working with the Disqus comments importer. Likely it'll be available in one or two months.

          Maybe in the beginning, before it's "100% well tested", it'll work like this: you'd send me a Disqus xml export file, and I first test import it myself to a test server to verify all is fine, and then to the real production environment. Before letting people do this themselves.

          Probably I'll enable exporting-one's-site-as-JSON for everyone also, as part of this.

          1. Steve Mitchell @SteveM2019-06-11 15:23:37.043Zreplies toKajMagnus:

            Happy to provide an exported copy of mine, as long as you're careful not to change the URLs or otherwise as Disqus might then start pointing follow on comments to you!

            1. KajMagnus @KajMagnus2019-06-18 12:47:10.577Zreplies toSteveM:

              Ok, that sounds good. Actually I don't quite understand this: "Disqus might then start pointing follow on comments to you" — what's that? I mean, pointing follow on comments to Talkyard?

              With not changing the URLs, that means that the blog comments should be available at the exact same page url, when using Talkyard, as with Disqus? So ifhttps://server/the/page is the url to a page with DIsqus comments, then, after importing to Talkyard, those same comments should thereafter appear at the exact same url, also when using Talkyard? (That's how I have in mind to make things work)

              1. Steve Mitchell @SteveM2019-06-18 16:13:10.411Zreplies toKajMagnus:

                Ah sorry.... Importers will often change the URL to point to another site while testing, which changes the original comment to point there, which I definitely don't want. That's all I meant.

                1. KajMagnus @KajMagnus2019-06-23 00:13:28.274Zreplies toSteveM:

                  Thanks for explaining, .... Hmm, how can a comment point to anywhere? I'm thinking a comment is text, not a link?

                  1. KajMagnus @KajMagnus2019-06-23 00:15:12.497Z

                    Brief status update: Coding wise, this is mostly done, however maybe 1 week for fixing not-so-common "corner cases" and 1 week code review and 1 week writing tests, remain ...

                    I'm making the imports idempotent, meaning, if one imports the same Disqus comments many times, no duplicated comments get created. And one can import a Disqus export file, then import the almost same file again but with a few more comments, and this'll work fine: the 2nd time, only the a-few-more-comments (that weren't present in the 1st export file) get created in Talkyard.

                    1. Steve Mitchell @SteveM2019-06-23 20:52:41.556Zreplies toKajMagnus:

                      The comment itself doesn't have a link, but is attached to a link/page/item. That link can be changed using various Disqus importer options and tools, and can create havoc if you're testing with a staging system, as an example. I've copied comments, which must have some unique identifier in Disqus' system, over to a staging site, and Disqus has happily updated all of my live, production comments to point to that page URL.

                      All of a sudden, comments are then showing up as "new" to some subscribers, and they are being directed to a potentially bad site that could be in a state of transition, etc. One of the (many) reasons I hate Disqus :)

                      1. Steve Mitchell @SteveM2019-07-03 04:58:17.757Zreplies toKajMagnus:

                        How is the work going on the importer? I have been using Talkyard in a limited fashion, and run through all of the other similar privacy-focused solutions out there, and have yet to find one that has such a well done interface. The only differentiator on the other solutions is their existing Disqus import, which I'm hoping to use here too!

                        1. KajMagnus @KajMagnus2019-07-03 05:16:00.999Zreplies toSteveM:

                          @SteveM I'm actually working with adding test code for the Disqus importer right now. I've implemented the importer, and doing code review and adding automatic tests, will take about two more weeks. ... There'll also be some more things included in the next Talkyard release (namely upserting forum categories via API, and exporting one's comments, to avoid lockin) ... so the next release, with Disqus import, will be available in about a month.

                          run through all of the other similar privacy-focused solutions out there

                          Can I ask which websites / places did you primarily visit, to find out which commenting systems exist? (Maybe there're some places that don't mention Talkyard; then I could contact them and let them know there's Talkyard too)

                          1. Steve Mitchell @SteveM2019-07-03 16:42:57.671Zreplies toKajMagnus:

                            Thanks for the update. Looking forward to importing my comments.

                            I can't remember any spots that didn't already cite Talkyard. I use Ghost as a blogging platform, and they of course just highlighted Talkyard as well as Commento and already have integrations with Discourse (not that great) and Disqus. Most of the other spots already have Talkyard listed from what I remember.