No internet connection
  1. Home
  2. Ideas

Bulk import from Disqus

By Jason @detly2018-04-22 11:33:46.705Z

I'm migrating over from Disqus to Talkyard, and I have a lot of old comments in Disqus. Fortunately Disqus allow you to export comments, so I'm wondering what the best way to import them into Talkyard would be.

Just to be clear on what I'm asking, I don't need a "does-everything-for-me" wizard that understands Disqus' export format. I'm happy to write some Python and parse the XML. It's the "getting it into Talkyard" step I'm not sure about. I'd be submitting a massive number of comments under other peoples' email addresses, so I'm thinking I might need to turn off the spam/flood/auth filters, unless there's a method for an admin to post on behalf of someone else?

But more generally, how would I programmatically populate these comments? Are there docs for this, or a particular source file I should look at? Disqus gives me:

  • Display name
  • Email
  • Date/time
  • IP address
  • Comment
  • Threading information
  • The post it was on (via another XML section)

Where should I start?

  • 38 replies

There are 38 replies. Estimated reading time: 35 minutes

  1. KajMagnus @KajMagnus2018-04-23 13:58:42.617Z

    would I programmatically populate these comments? Are there docs for this, or a particular source file I should look at?

    There is a HTTP endpoint to which one can POST a JSON file, with users, emails, topics, comments etcetera, in a Talkyard specific JSON structure. But right now it's for end-to-end tests only (it says 401 Forbidden for anything that isn't an end-to-end test).

    I could look into enabling it for "real" usage, and see what more things it maybe needs to do / support, to be able to import Disqus comments.

    And then, if you write a Python script that converst from Disqus XML to Talkyard's JSON format, you could send the JSON file to the Talkyard server (when you're logged in as admin), and all comments would get imported.

    (Here's the source: https://github.com/debiki/talkyard/blob/master/app/controllers/ImportExportController.scala )

    Here's how the JSON looks: (this JSON creates an end-to-end test site. It's an excerpt — I deleted things that's off-topic for Disqus comments)

    (this is just to give you and idea about roughly how it looks — probably you need more details, to be able to write the Python script. Also there're some fields below that you don't need to send to the server, it could fill them in itself. )

    {
      "members": [
        {
          "id": 101,
          "username": "owen_owner",
          "fullName": "Owen Owner",
          "createdAtMs": 1449198824000,
          "emailAddress": "e2e-test--owen-owner@example.com",
          "emailVerifiedAtMs": 1449198824000,
          "passwordHash": "cleartext:publicOwen123",
          "password": "publicOwen123",
          "isOwner": true,
          "isAdmin": true,
          "trustLevel": 2
        },
        {
          "id": 102,
          "username": "mod_mons",
          "fullName": "Mod Mons",
          "createdAtMs": 1449198824000,
          "emailAddress": "e2e-test--mod-mons@example.com",
          "emailVerifiedAtMs": 1449198824000,
          "passwordHash": "cleartext:publicMons123",
          "password": "publicMons123",
          "isModerator": true,
          "trustLevel": 2
        }
        ...
      ],
      "identities": [],
      "guests": [
        {
          "id": -10,
          "fullName": "Guest Gunnar",
          "createdAtMs": 1449198824000,
          "emailAddress": "e2e-test--guest-gunnar@example.com",
          "isGuest": true
        }
        ...
      ],
      "pages": [
        {
          "id": "byMariaCategoryA",
          "role": 12,
          "categoryId": 2,
          "authorId": 106,
          "createdAtMs": 1449198824000,
          "updatedAtMs": 1449198824000,
          "version": 1
        },
        {
          "id": "byMariaCategoryA_2",
          "role": 12,
          "categoryId": 2,
          "authorId": 106,
          "createdAtMs": 1449198824000,
          "updatedAtMs": 1449198824000,
          "version": 1
        }
        ...
      ],
      "pagePaths": [
        {
          "folder": "/",
          "pageId": "byMariaCategoryA",
          "showId": false,
          "slug": "by-maria-category-a"
        },
        {
          "folder": "/",
          "pageId": "byMariaCategoryA_2",
          "showId": false,
          "slug": "by-maria-category-a-2"
        }
        ...
      ],
      "posts": [
        {
          "id": 114,
          "pageId": "byMariaCategoryA",
          "nr": 1,
          "createdAtMs": 1449198824000,
          "createdById": 106,
          "currRevStartedAtMs": 1449198824000,
          "currRevById": 106,
          "numDistinctEditors": 1,
          "approvedSource": "By Maria in CategoryA, text text text.",
          "approvedHtmlSanitized": "<p>By Maria in CategoryA, text text text.</p>",
          "approvedAtMs": 1449198824000,
          "approvedById": 1,
          "approvedRevNr": 1,
          "currRevNr": 1
        },
        {
          "id": 115,
          "pageId": "byMariaCategoryA_2",
          "nr": 0,
          "createdAtMs": 1449198824000,
          "createdById": 106,
          "currRevStartedAtMs": 1449198824000,
          "currRevById": 106,
          "numDistinctEditors": 1,
          "approvedSource": "By Maria in CategoryA nr 2 title",
          "approvedHtmlSanitized": "By Maria in CategoryA nr 2 title",
          "approvedAtMs": 1449198824000,
          "approvedById": 1,
          "approvedRevNr": 1,
          "currRevNr": 1
        }
        ...
      ]
    }
    
    1. DJason @detly2018-04-28 10:42:04.690Z

      Thanks for this! I'll work on a script to create the JSON, and maybe by the time I've finished either you'll have an endpoint for it to be posted to or I'll have learnt Scala.

      A few questions:

      • Just overall, which source file should I dig into to understand the structure of this?
      • I notice that the guest ID is -10. Are all guest IDs negative?
      • How does threading work? Can I link a post to a parent post?
      • What's nr in the post data?
      • Can I skip the approvedSource since I already have my sanitised HTML via Disqus?
      1. KajMagnus @KajMagnus2018-04-30 10:30:19.173Z

        Ok :- )

        which source file should I dig into

        The end-to-end test files I would suggest. They create the JSON structure a Disqus importer also would need to create. Look here:

        • A Typescript definition of the JSON structure, interface SiteData in tests/e2e/test-types.ts

        • A function that constructs a discussion topic and adds to that JSON structure: addPage, here in tests/e2e/utils/site-builder.ts.
          The field role: PageRole should be set to PageRole.EmbeddedComments = 5 (an enum) for embedded comments topics.
          (here's that enum: client/app/model.ts )

        • How to create user JSON objects: functions like memberMaria and guestGunnar, in tests/e2e/utils/make.ts

        • Adding users to the JSON obj, in site-builder.ts
          e.g. site.members.push(forum.members.mallory);

          You could either 1) import the Disqus users into guests accounts (they don't need any password or username), or 2) into "real" accounts, i.e. with password and username. I suppose you'd then generate random passwords, and if someone who has commented on your blog previously, would want to continue using the same account, s/he would click "Forgot password", and get a password reset email.

        Are all guest IDs negative?

        Yes, <= -10 are for guests, and >= 100 are for members with real accounts. There are some magic ids too, from -9 up to +9, like +1 for the System user. And (in case you're curious) default built-in groups (Everyone, New Members, ... Regular Members, Core Members) have ids 10, 11, 12, ...).

        How does threading work? Can I link a post to a parent post?
        What's nr in the post data?

        One links to the parent post, via the field parentNr. Each post has a field nr which is the order in which that post was added to the discussion.

        The page title has nr = 0, page body (a.k.a. the Original Post, for forum topics) has nr 1. The first comment has nr 2, and parentNr = 1. The 2nd comment has nr = 3, and parentNr is 1 or 2, depending on if it replies to the blog post = nr 1, or to the first comment = nr 2. And so on.

        Embedded discussion pages have auto generated titles like "Comments for <the blog post url>)".

        There's also an id field, which uniquely identifies a comment in the database. nr is unique within a certain discussion only. If an admin moves a comment from one discussion to another, it'll get a new nr, but keep the same id.

        Note to myself: I'll probably need to make the importer work, without any id fields. It's not really possible for you to know which ids to use, since there are some ids in the database already (and those should be avoided).

        Can I skip the approvedSource since I already have my sanitised HTML via Disqus?

        It's used for editing: If someone decides to edit a post (e.g. you — admins can edit other's posts), the editor will display the source for that comment (which is the approvedSource field in the JSON to import).

        You can set approvedSource to the HTML exported from Disqus — that is, set both approvedSource and approvedHtmlSanitized to the post's HTML. Then, if someone wants to edit a comment imported from Disqus, s/he'll see & can edit the HTML from Disqus.


        I hope this helps :- ) & I've a little bit started looking at what I need to do server side.

        1. KajMagnus @KajMagnus2018-04-30 15:51:13.933Z

          Mentioning @detly. So you'll get a notification email and see my comment above.

          (About a week ago I changed the email notification sent-from address, but forgot to verify the new sent-from address, so no emails got sent :- P )

    2. Progress
      with doing this idea
    3. Steve Mitchell @SteveM2019-04-19 23:15:48.477Z

      Did anyone get code or bits and pieces working to enable this? I have a site I'd love to migrate to Talkyard, but it has lots of comments in Disqus that I can't lose, and would need to import...

      1. KajMagnus @KajMagnus2019-04-20 05:37:09.420Zreplies toSteveM:

        Hi @SteveM, right now there's no import-from-Disqus (that I know about). I started writing an importer, then postponed that. Recently I did a bit more related work ... Maybe in two months, there'll be an importer.

        1. @KajMagnus marked this topic as Started 2019-06-11 15:12:24.104Z.
        2. KajMagnus @KajMagnus2019-06-11 15:12:11.895Z2019-06-11 15:20:06.195Z

          @SteveM (and Jason) — Now I've resumed working with the Disqus comments importer. Likely it'll be available in one or two months.

          Maybe in the beginning, before it's "100% well tested", it'll work like this: you'd send me a Disqus xml export file, and I first test import it myself to a test server to verify all is fine, and then to the real production environment. Before letting people do this themselves.

          Probably I'll enable exporting-one's-site-as-JSON for everyone also, as part of this.

          1. Steve Mitchell @SteveM2019-06-11 15:23:37.043Zreplies toKajMagnus:

            Happy to provide an exported copy of mine, as long as you're careful not to change the URLs or otherwise as Disqus might then start pointing follow on comments to you!

            1. KajMagnus @KajMagnus2019-06-18 12:47:10.577Zreplies toSteveM:

              Ok, that sounds good. Actually I don't quite understand this: "Disqus might then start pointing follow on comments to you" — what's that? I mean, pointing follow on comments to Talkyard?

              With not changing the URLs, that means that the blog comments should be available at the exact same page url, when using Talkyard, as with Disqus? So ifhttps://server/the/page is the url to a page with DIsqus comments, then, after importing to Talkyard, those same comments should thereafter appear at the exact same url, also when using Talkyard? (That's how I have in mind to make things work)

              1. Steve Mitchell @SteveM2019-06-18 16:13:10.411Zreplies toKajMagnus:

                Ah sorry.... Importers will often change the URL to point to another site while testing, which changes the original comment to point there, which I definitely don't want. That's all I meant.

                1. KajMagnus @KajMagnus2019-06-23 00:13:28.274Zreplies toSteveM:

                  Thanks for explaining, .... Hmm, how can a comment point to anywhere? I'm thinking a comment is text, not a link?

                  1. KajMagnus @KajMagnus2019-06-23 00:15:12.497Z

                    Brief status update: Coding wise, this is mostly done, however maybe 1 week for fixing not-so-common "corner cases" and 1 week code review and 1 week writing tests, remain ...

                    I'm making the imports idempotent, meaning, if one imports the same Disqus comments many times, no duplicated comments get created. And one can import a Disqus export file, then import the almost same file again but with a few more comments, and this'll work fine: the 2nd time, only the a-few-more-comments (that weren't present in the 1st export file) get created in Talkyard.

                    1. Steve Mitchell @SteveM2019-06-23 20:52:41.556Zreplies toKajMagnus:

                      The comment itself doesn't have a link, but is attached to a link/page/item. That link can be changed using various Disqus importer options and tools, and can create havoc if you're testing with a staging system, as an example. I've copied comments, which must have some unique identifier in Disqus' system, over to a staging site, and Disqus has happily updated all of my live, production comments to point to that page URL.

                      All of a sudden, comments are then showing up as "new" to some subscribers, and they are being directed to a potentially bad site that could be in a state of transition, etc. One of the (many) reasons I hate Disqus :)

                      1. Steve Mitchell @SteveM2019-07-03 04:58:17.757Zreplies toKajMagnus:

                        How is the work going on the importer? I have been using Talkyard in a limited fashion, and run through all of the other similar privacy-focused solutions out there, and have yet to find one that has such a well done interface. The only differentiator on the other solutions is their existing Disqus import, which I'm hoping to use here too!

                        1. KajMagnus @KajMagnus2019-07-03 05:16:00.999Zreplies toSteveM:

                          @SteveM I'm actually working with adding test code for the Disqus importer right now. I've implemented the importer, and doing code review and adding automatic tests, will take about two more weeks. ... There'll also be some more things included in the next Talkyard release (namely upserting forum categories via API, and exporting one's comments, to avoid lockin) ... so the next release, with Disqus import, will be available in about a month.

                          run through all of the other similar privacy-focused solutions out there

                          Can I ask which websites / places did you primarily visit, to find out which commenting systems exist? (Maybe there're some places that don't mention Talkyard; then I could contact them and let them know there's Talkyard too)

                          1. Steve Mitchell @SteveM2019-07-03 16:42:57.671Zreplies toKajMagnus:

                            Thanks for the update. Looking forward to importing my comments.

                            I can't remember any spots that didn't already cite Talkyard. I use Ghost as a blogging platform, and they of course just highlighted Talkyard as well as Commento and already have integrations with Discourse (not that great) and Disqus. Most of the other spots already have Talkyard listed from what I remember.

                            1. KajMagnus @KajMagnus2019-07-29 08:36:16.220Z

                              Status update: Now I've written the import-Disqus-comments code, and added automatic tests. Seems to work fine. Next, code review. About one week. And fixing things I find during code review, one more week? And thereafter, I think I can import your Disqus comments dumps into your Talkyard .net sites. @SteveM and @detly

                              1. Steve Mitchell @SteveM2019-08-16 19:17:11.699Zreplies toKajMagnus:

                                Any new news on this feature? Would really love to get away from Disqus!

                                1. KajMagnus @KajMagnus2019-08-17 12:27:55.833Z2019-08-17 12:36:04.215Zreplies toSteveM:

                                  Hi Steve, I think everything is ready, except that I have 2 pages Disqus importer code left to code review (and I don't expect to find anything "interesting"). ... Thereafter, I'd like to start with importing Jason's @detly's comments — I have a dump of his comments already. This will probably happen next week. After that, I can message you, and you can send me a Disqus export, and I'll import your comments? At the end of the next week or the week after, I would think.

                                  Or by the way, maybe you'd like to send me a Disqus comments xml export file now directly? Then I can test import it next week, to a test server, and have a look that all seems fine. ... And if no additional Disqus comments get posted during that time, I can import that same dump to the real server.

                                  kajmagnus at talkyard.io

                                  1. Steve Mitchell @SteveM2019-08-17 16:40:14.234Zreplies toKajMagnus:

                                    Sounds good! Happy to provide my Disqus export when needed.

                                    1. KajMagnus @KajMagnus2019-08-28 19:30:09.297Zreplies toSteveM:

                                      Status update: I just did code review of the Disqus import code, and test imported Jason's @detly comments on localhost, worked fine. Next week I'll import to the real server. (Sorry, everything got delayed 1 week because I felt I had to do some growth hacking / marketing in between :- ))

                                      1. Steve Mitchell @SteveM2019-08-28 19:33:11.570Zreplies toKajMagnus:

                                        Sounds good. Let me know when you would like my data.

                                        1. KajMagnus @KajMagnus2019-09-04 07:07:12.315Zreplies toSteveM:

                                          Hi @SteveM would you like to send me your Disqus data? Then I can test import it to a test site and verify that all works fine, before importing to your real site. My email is kajmagnus at talkyard.io

                                          1. Steve Mitchell @SteveM2019-09-04 07:26:49.853Zreplies toKajMagnus:

                                            Sent! Let me know if you have any questions or issues.

                                            1. KajMagnus @KajMagnus2019-09-04 08:38:30.061Zreplies toSteveM:

                                              Thanks, i've downloaded it now (worked fine).

                                              1. KajMagnus @KajMagnus2019-09-21 06:32:45.105Zreplies toSteveM:

                                                Today I test imported to localhost, after having made the final (?) fixes to the importer. Seems to have worked fine. — Now I just need to code review, update the server and then I can import to your real site.

                                                (Details: I've fixed some things with the importer, e.g. reuse the <id> field as Talkyard's "external id", so it'll be safe to re-import the same Disqus comments dump, maybe with additional more recently posted comments, without duplicating anything. And today I made some final (?) fixes related to too-grumpy Talkyard consistency checks, and weird Disqus dump timestamps.)

                                                1. KajMagnus @KajMagnus2019-10-04 07:38:10.325Z2019-10-04 07:44:30.121Z

                                                  Jason @detly and @SteveM, today I Imported Jason's Disqus comments. Seems to have worked fine.

                                                  Example: http://heeris.id.au/2014/if-programming-languages-were-harry-potter-characters/

                                                  Jason, the URLs have changed, for some of the really old blog posts, since your blog was created: Nowadays (the last 5 years or so), the blog posts end with a /, but before that, there was no slash. Resulting in the comments not appearing, for those old posts, because they're associated with the old no-slash URL. — So, I need to add an interface for editing the embedding URL, for blog comments discussions. Thereafter, I'll be able to update the URLs.

                                                  (Maybe the blog was migrated to some different software long ago? Which adds a / to the URLs.)

                                                  These blog comments, for example, won't appear: https://comments-for-heeris-id-au.talkyard.net/-27/imported-from-disqus

                                                  Details:

                                                  Originally, the URL was: https://heeris.id.au/2013/this-is-why-you-shouldnt-interrupt-a-programmer (with no trailing slash), and all comments are associated with this URL.

                                                  Nowadays though, that blog post is instead located at: https://heeris.id.au/2013/this-is-why-you-shouldnt-interrupt-a-programmer/ (with a slash). But in the Disqus dump, and now in Talkyard, this .../ URL has no comments.

                                                  1. Steve Mitchell @SteveM2019-10-12 04:46:14.254Zreplies toKajMagnus:

                                                    I've cleaned up some of my Disqus data after finding a ton of weird URLs when doing a full dump of all comments. I can send you an updated file if needed.

                                                    Any ETA on when this importer would be available? I badly want to get rid of Disqus as fast as possible!

                                                    1. KajMagnus @KajMagnus2019-10-12 12:57:51.006Zreplies toSteveM:

                                                      @SteveM Possibly on Monday next week — I'm done building the URL "fixer", so one can edit and fix old incorrect URLs (e.g. add a trailing slash or remove, as needed). I'll try fixing Jason's blog tomorrow (the 5 year old posts with broken URLs) and then I can message you, and import your comments on Monday?

                                                      Would you like to email me your new Disqus dump? If you do, is it then OK if I import it tomorrow or on Monday without asking you, or should I confirm with you first?

                                                      1. KajMagnus @KajMagnus2019-10-13 12:44:11.413Zreplies toSteveM:

                                                        Now I've fixed all URLs over at Jason's blog, worked fine. Jason @detly The comments are back :- )

                                                        https://heeris.id.au/2013/this-is-why-you-shouldnt-interrupt-a-programmer/ (scroll down to see the comments)

                                                        Wow, 1.5 years after this topic was initially opened. This took a while and many small steps.

                                                        @SteveM, feel free to send me a new Disqus dump when you have time, and I'll import it to Seabits.

                                                        1. Steve Mitchell @SteveM2019-10-15 01:46:39.406Zreplies toKajMagnus:

                                                          Just sent you an updated XML export via email. Thanks!

                                                          1. KajMagnus @KajMagnus2019-10-16 10:34:52.263Zreplies toSteveM:

                                                            I got the file, small problems, I'll continue tomorrow. Details: Nginx accepts only an 1 MB upload, whilst the file, converted to Talkyard JSON, is 1.3 MB. I've configured Nginx to accept 10 MB, but only in one place, apparently I need to do this at 3 different places in Nginx. — I'll look into this tomorrow

                                                            1. KajMagnus @KajMagnus2019-10-17 08:57:27.086Zreplies toSteveM:

                                                              @SteveM — now I've imorted the comments. Here: https://comments-for-seabits-com.talkyard.net/ (you need to login to see the imported comments).

                                                              (And here're the instructions for adding to the blog: https://comments-for-seabits-com.talkyard.net/-/admin/settings/embedded-comments )

                                                              1. Steve Mitchell @SteveM2019-10-25 23:30:26.205Zreplies toKajMagnus:

                                                                Well, I finally got around to adding this, and no comments are showing up.

                                                                I have another competing solution embedded just below and it is showing everything OK, so something must still be up with mapping or?

                                                                Do I need to go in and add every ID from every Ghost page to each of the discussions in Talkyard? I thought it would map them based on the URL....

                                                                1. KajMagnus @KajMagnus2019-10-28 07:31:42.438Zreplies toSteveM:

                                                                  I'll have a look later today (sorry didn't see your message). The Ghost discussion ids & URLs should get imported & work automatically.

                                                                  1. KajMagnus @KajMagnus2019-10-28 09:37:20.364Z2019-10-28 09:56:28.638Zreplies toSteveM:

                                                                    You use Ghost right? I think the problem might be that my copy-paste instructions use this code for generating a discussion id:

                                                                    data-discussion-id="ghost-{{comment_id}}"
                                                                    

                                                                    However that won't work for already existing pages — because they don't have the "ghost- ..." prefix. So this looks like a bug by me.

                                                                    Would you like to try to replace this, in the HTML snippet you copy-pasted: (it's on the line in the middle)

                                                                    data-discussion-id="ghost-{{comment_id}}"
                                                                    

                                                                    with this:

                                                                    data-discussion-id="{{comment_id}}"
                                                                    

                                                                    And if that won't work, with just this:

                                                                    data-discussion-id=""
                                                                    

                                                                    I tested this last case, i.e. "", and that works for me. (This is b.t.w. how Commento works — it looks only at URLs, not Disqus' or Ghost's discussion ids.)

                                                                    For example, this HTML snippet works for me, for having the comments appear over at your blog:
                                                                    (I added an /etc/hosts entry on my laptop so I could test as if from your blog)

                                                                    <script>talkyardServerUrl='https://comments-for-seabits-com.talkyard.net';</script>
                                                                    <script async defer src="https://c1.ty-cdn.net/-/talkyard-comments.min.js"></script>
                                                                    <!-- You can specify a per page discussion id on the next line, if your URLs might change. -->
                                                                    <div class="talkyard-comments" data-discussion-id="" style="margin-top: 45px;">
                                                                    <noscript>Please enable Javascript to view comments.</noscript>
                                                                    <p style="margin-top: 25px; opacity: 0.9; font-size: 96%">Comments powered by
                                                                    <a href="https://www.talkyard.io">Talkyard</a>.</p>
                                                                    </div>
                                                                    

                                                                    With the cruft "Please enable Javascript..." removed, it's this:   (note: data-discussion-id="" )

                                                                    <script>talkyardServerUrl='https://comments-for-seabits-com.talkyard.net';</script>
                                                                    <script async defer src="https://c1.ty-cdn.net/-/talkyard-comments.min.js"></script>
                                                                    <div class="talkyard-comments" data-discussion-id="">
                                                                    </div>
                                                                    

                                                                    ***

                                                                    Edit: Actually Ghost's own instructions seem broken to me, in that they suggest using this: this.page.identifier = 'ghost-{{comment_id}}';, and that's what I was looking at when I wrote my code I think — and that doesn't look backwards compatible with ids imported from e.g. WordPress, Disqus etc. (will only work for new discussions, when setting up a new blog), because those old discussions won't have the "ghost-" prefix.

                                                                    https://ghost.org/docs/api/v2/handlebars-themes/context/post/#comment-id —>
                                                                    https://github.com/TryGhost/Casper/blob/d92dda3523c27d68fa78088cd1138300b96bc7c8/post.hbs#L83

                                                                    1. Steve Mitchell @SteveM2019-10-28 15:54:35.669Zreplies toKajMagnus:

                                                                      Ah, that did it! It's working now. Yes, I am using Ghost...

                                                                      Seems like there were some changes on their side that may have caused some issues at least with the discussion IDs and whether I had those from before.

                                                                      Glad we were able to figure that out!

                                                                      I have some feedback on emails and user settings, but I'm going to test and explore that now that it is working on the site first, read the docs, and then post something if I can't figure it out.

                                                                      Thanks for all of your help! Having my old comments was really important as there are so many good discussions and data in those that folks refer to.