iBlog -> WordPress

Strategies and help for moving an iBlog setup to another blogging platform

iBlog -> WordPress

Postby Jerry » Tue Mar 17, 2009 3:59 pm

NOTE:
What follows is a long and rather technical explanation of my migration from iBlog 2 to WordPress. You don't need to understand all this to use the script. The script itself is attached in a reply later in this thread, along with (very) brief instructions.



What follows is a cut-and-paste from my blog, with much of the formatting messed up. It is probably easier to read over there.

I started using iBlog several years ago, when it was new and I was new to blogging. It had one advantage over other blogging packages: it came free with my .mac account back in the day and it worked on .mac servers, which are, to put it kindly, inflexible.

Two things have happened in the intervening years: first, all the blogging platforms have gotten much better, including the ability to work on the blog while offline. The second is that iBlog made an abortive step forward to iBlog 2, which was a major improvement, but then the whole company stalled before that release was really finished (although by then I was fully committed to it). I will miss iBlog 2, but not as much as I will enjoy getting my stuff onto a faster, more versatile platform.

After a rather exhaustive search of blogging and CMS systems, I settled on WordPress. While it's not perfect, it is a straightforward MySQL-Apache-php application that is easy to fiddle with, and some of the customizations I was looking for were much easier with WordPress than with others.

WordPress has a whole bunch of tools and instructions for importing your stuff from other blog systems. None of those did me much good at all, however, as iBlog was too obscure for anyone to worry about. After searching the Internet I found some helpful information, but it all applied to iBlog 1 - most people never made the move to the ill-fated upgrade. I was pretty much on my own.

WordPress can import data in a variety of formats, but it was up to me to get the data out of iBlog in a format WrodPress could understand. The most versatile format was one created by the folks at WordPress, which could include information specific to WordPress. Cool! Decision made, I was on my way.

Except... the folks at WordPress have never bothered to document the structure of their files. Apparently It's something they've been meaning to get around to eventually (though the people writing translation software for the other major blogging software have long since muddled through it). I did what everyone else has had to do to export data: copy one of WordPress's files and fiddle with it until it works. Not only is this a pain in the patoot, there might be tags that don't appear in my examples that could nonetheless be useful to me. Oh, well.

I needed my import file to include definitions of categories, and then each of the blog entries, with correct category associations. My example file had a lot of fields that seemed redundant for my purposes, but without documentation I wasn't going to waste time trying to figure out which tags were required and which weren't.

Here is a very small (one episode) export file. We'll go into the details of things like nicename later:

Code: Select all
<channel>
   <title>Muddled Ramblings and Half-Baked Ideas</title>
   <link>http://jerssoftwarehut.com/muddled</link>
   <description>blog!</description>
   <pubDate>Thu, 28 Jun 2007 21:32:21 +0000</pubDate>
   <generator>Jers Very Clever Script</generator>
   <language>en</language>
   <wp:wxr_version>1.0</wp:wxr_version>
   <wp:base_site_url>http://jerssoftwarehut.com/muddled</wp:base_site_url>
   <wp:base_blog_url>http://jerssoftwarehut.com/muddled</wp:base_blog_url>


<wp:category>
   <wp:category_nicename>bars-of-the-world-tour</wp:category_nicename>
   <wp:category_parent></wp:category_parent>
   <wp:posts_private>0</wp:posts_private>
   <wp:links_private>0</wp:links_private>
   <wp:cat_name><![CDATA[Bars of the World Tour]]></wp:cat_name>
   <wp:category_description><![CDATA[blah blah blah]]></wp:category_description>
</wp:category>


<item>
   <title>Delayed by Weather</title>
   <link></link>
   <pubDate>2007-03-27 18:23:57</pubDate>
   <dc:creator><![CDATA[Jerry]]></dc:creator>
   <category><![CDATA[Bars of the World Tour]]></category>
   <category domain="category" nicename="bars-of-the-world-tour"><![CDATA[Bars of the World Tour]]></category>
   <content:encoded><![CDATA[<p>The Weather Channel is calling the roads around here "a big mess", so I'm going to take time out from driving and catch up on some writing. Unfortunately, TWC is also calling for dangerous surf and "rough bar conditions". I'd better leave the laptop in my room.</p>]]></content:encoded>
   <excerpt:encoded><![CDATA[&nbsp;]]></excerpt:encoded>
   <wp:post_id>1065</wp:post_id>
   <wp:post_date>2007-03-27 18:23:57</wp:post_date>
   <wp:post_date_gmt>2007-03-27 18:23:57</wp:post_date_gmt>
   <wp:comment_status>open</wp:comment_status>
   <wp:ping_status>open</wp:ping_status>
   <wp:post_name>Delayed by Weather</wp:post_name>
   <wp:status>publish</wp:status>
   <wp:post_parent>0</wp:post_parent>
   <wp:post_type>post</wp:post_type>
</item>

</channel>
</rss>


But how to create the file? The data for iBlog 2 is distributed over (literally) thousands of files. Writing a program to track down all the information and make sense of it would be a major chore. That's where AppleScript came in. iBlog's programmer took the time to provide access to the iBlog data through the Apple Scripting system. I was able to let iBlog read all of its silly scattered files and make sense of them, then provide the data to me in a coherent fashion. So far, so good. All I needed to do was loop through all the episodes, pull out the data I needed, and shovel it into a text file that WordPress could read.

[IMPORTANT NOTE: I've tried to go back and reconstruct the scripts as they were at the appropriate stage in development, but the snippets are untested.]

[ALSO IMPORTANT: you don't really have to understand the code. If you are in this boat, I will help you. You should understand the challenges, but I'm here for you.]


Code: Select all
on run
   set exportFile to 0
   
   try
      set exportFile to open for access "Users:JerryTi:Documents:scripts:" & niceName & ".xml" with write permission
      set eof of exportFile to 0
      tell application "iBlog" to set cats to the categories of the first blog
      repeat with cat in cats
         tell application "iBlog" to set catname to (the name of cat) as text
         set niceName to the first word of catname
         write rssHead to exportFile as «class utf8» -- xml/rss header stuff that's always the same
         set catDescription to "blah blah blah"
         -- write out the category info
         tell application "iBlog" to set nextText to "<wp:category>" & newLine & tab & "<wp:category_nicename>" & niceName & "</wp:category_nicename>" & newLine & tab & "<wp:category_parent></wp:category_parent>" & newLine & tab & "<wp:posts_private>0</wp:posts_private>" & newLine & tab & "<wp:links_private>0</wp:links_private>" & newLine & tab & "<wp:cat_name><![CDATA[" & catname & "]]></wp:cat_name>" & newLine & tab & "<wp:category_description><![CDATA[" & catDescription & "]]></wp:category_description>" & newLine & "</wp:category>" & newLine & newLine
         
         write nextText to exportFile as «class utf8» -- have to coerce the text from 16-bit unicode
         tell application "iBlog" to set ents to the entries of cat
         repeat with ent in ents
            -- get the stuff in iBlog's world, work with it here
            tell application "iBlog"
               set titl to (the title of ent)
               set desc to (the summary of ent)
               set bod to (the body of ent)
               set postDate to the post date of ent
            end tell
            
            set nextText to ((("<item>" & newLine & tab & "<title>" & titl & "</title>" & newLine & tab & "<link></link>" & newLine & tab & "<pubDate>" & postDate) & "</pubDate>" & newLine & tab & "<dc:creator><![CDATA[Jerry]]></dc:creator>" & newLine & tab & "<category><![CDATA[" & the name of cat & "]]></category>" & newLine & tab & "<category domain=\"category\" nicename=\"" & niceName & "\"><![CDATA[" & the name of cat & "]]></category>" & newLine & tab & "<content:encoded><![CDATA[" & bod & "]]></content:encoded>" & newLine & tab & "<excerpt:encoded><![CDATA[" & desc & "]]></excerpt:encoded>" & newLine & tab & "<wp:post_id></wp:post_id>" & newLine & tab & "<wp:post_date>" & postDate) & "</wp:post_date>" & newLine & tab & "<wp:post_date_gmt>" & postDate) & "</wp:post_date_gmt>" & newLine & tab & "<wp:comment_status>open</wp:comment_status>" & newLine & tab & "<wp:ping_status>open</wp:ping_status>" & newLine & tab & "<wp:post_name>" & titl & "</wp:post_name>" & newLine & tab & "<wp:status>publish</wp:status>" & newLine & tab & "<wp:post_parent>0</wp:post_parent>" & newLine & tab & "<wp:post_type>post</wp:post_type>" & newLine & "</item>" & newLine & newLine
            write nextText to exportFile as «class utf8»
         end repeat
      end repeat
      write rssTail to exportFile as «class utf8» -- xml/rss file closing stuff
      
   on error errStr number errorNumber
      if exportFile is not equal to 0 then
         close access exportFile
         set exportFile to 0
      end if
      error errStr number errorNumber
   end try
   
   if exportFile is not equal to 0 then
      close access exportFile
      set exportFile to 0
   end if
end run

So far things are pretty simple. The script loops through the categories, and in each category it pulls out all the episodes. Only it kept stalling. It turns out that sometimes iBlog took so long to respond that the script gave up waiting. I added

Code: Select all
   with timeout of 600 seconds

at the start to make the script wait a full ten minutes for iBlog to respond. Yes, iBlog certainly is no jackrabbit of a program.

Now the program ran! The only problem is, the resulting file doesn't work. Hm. The first thing the importer reports is that it can't read the dates the way AppleScript formats them. So, I added a function to reformat all the dates to match the example. Then it was importing categories, but not items. Why not?

Um... actually I don't remember the answer to that one. Let's just say that it took a lot of fiddling and testing to get it right. Eventually, hurrah! There in my WordPress installation were episodes from iBlog.

And they looked like crap. The thing is, that iBlog included unnecessary HTML tags around the blog title, excerpt, and body. It's going to be a lot easier to clean them up now, while we're mucking with each bit of text anyway, so back to AppleScript's lousy string functions we go to clean up iBlog's mess. Now, after we get all the data from iBlog, we call a series of functions to clean it all up:

Code: Select all
               set titl to stripParagraphTags(titl)
               set desc to stripParagraphTags(desc)
               set postDate to formatDate(postDate)
               set bod to fixBlogBodyText(bod, postDate)

The actual functions are available in the attached final script.

Things are looking better, but still not very good. Much of this is due to some junk iBlog did when converting my older episodes into iBlog 2 format. One thing it did was to insert hard line breaks in the text of the blog body. No idea why. Maybe they were there all along and I had no way to see them. WordPress helpfully assumes that if you have a line break in the data it imports, you want a line break when it shows on the screen. So, every line break is replaced by a <br /> tag when imported into WordPress. This will not do. Additionally, iBlog replaced paragraph breaks </p><p> with a pair of break tags: <br /><br />. Once again, the reason for this is a mystery. The latter issue is less important, but we may as well address it while the hood is up.

Back we go into the fixBlogBodyText function, to repair more silly iBlog formatting. The resulting function looks like this:

Code: Select all
on fixBlogBodyText(s, postDate)
   -- this assumes that if an episode is supposed to start with a div, it will have a style or class
   if (the offset of "<div>" in s) is equal to 1 then
      set s to text 6 thru (the (length of s) - 6) of s
      
      -- in some cases there was an extra line feed at the end of the text as well
      if the last character of s is "<" then
         set s to text 1 thru (the (length of s) - 1) of s
      end if
      set s to "<p>" & s & "</p>"
   end if
   
   -- clean up iBlog junk (lots of this stuff is the result of upgrading to iBlog 2 - the conversion was not clean

   -- replace all line breaks with spaces
   set s to replaceAll(s, "
", " ")
   -- replace all double-break tags with paragraph tags
   set s to replaceAll(s, "<br /><br />", "</p>" & newLine & "<p>")
   -- replace all old-fashioned double-break tags with paragraph tags
   set s to replaceAll(s, "<br><br>", "</p>" & newLine & "<p>")
   -- get rid of some pointless span class info
   set s to replaceAll(s, " class=\"Apple-style-span\"", "")
   
   return s
end fixBlogBodyText
notes: replaceAll is a utility function I wrote that does pretty much what it says. You will find it in the attached source file. newLine is a variable I defined because left to it's own devices AppleScript uses the obsolete Mac OS 9 line endings. What's up with that? The post date is passed as a convenience for identifying blog entries in the error logs.

At this point the text is importing mostly nicely. But wait! I was running my tests just working with one category to save time. When I looked at Allison in Anime on WordPress, some really weird things started happening. It turns out that when importing the data, you need line breaks every now and then, otherwise the importer will insert them. That would be nice to put in the documentation somewhere! In one of my episodes, the newline was inserted right in the middle of a <div> tag, which led to all kinds of trouble. So, to the above script I added a line that inserts a line break between </p><p> tags. As long as any one paragraph isn't too long, I'll be all right.

Code: Select all
   set s to replaceAll(s, "</p><p>", "</p>" & newLine & "<p>")

And with that, we've done it! We've written a script that will export all the data from iBlog 2 and format it in a way that WordPress can accept. Time to run it on the whole blog, go take a little break, and come back and see how things went...

Dang. Didn't work. There's a maximum file size for import, and my blog is too damn big. Not a huge problem, just a bit of modification to make each category a separate file. Now, at last, the data is imported, the text looks nice, and we're ready to make the move to our new home.

Except...

The images don't show up, and links between episodes are broken. Also, it would be nice if people could still read the old Haloscan comments. I guess we're not done yet.

Image links were the easiest to repair. In iBlog 2 the source code always looks for the image at path /Media/. We just have to find those links and replace them with new info. I used Automator to find all the image files in the iBlog data folders, then I copied them all up to a directory on the WordPress server, and pointed all the links there. Worked like a charm! (Icerabbit goes into more detail on that process here. I used different tools, but the process is the same.)

Links between episodes turned out to be a lot trickier. It came down to this: How do I know what the URL of the episode is going to be when I load it into WordPress? I had to either know what the episode's id was going to be, or I had to know what its nicename was going to be.

Nicename is a modified title that can be used in URL's - no spaces and whatnot. "Rumblings from the Secret Labs" becomes "rumblings-from-the-secret-labs". If I set up wordpress to use the nicename to link to an episode rather than the ID number, it would have some advantages, but I can get long-winded (have you noticed?) and that applies to my episode titles as well. The URL's for my episodes could get really long. Therefore, I'd rather use the episode's ID for its permalink. (If you try the icerabbit link above, you will see the nicename version of a link.)

Happily, the import file format allows me to specify the id of episodes I upload. (I don't know what it does if there's already an episode with that ID.) After some fiddling I managed to specify reliably what ID to give each episode. Now in my script I make a big table with the iBlog paths to each episode and the ID I will assign it. Before the main loop I have another that builds the table:

Code: Select all
      -- first loop
      set postID to firstPostID
      set idTableRef to a reference to episodeIDTable
      tell application "iBlog" to set cats to the categories of the first blog
      repeat with cat in cats
         --set cat to item 1 of cats
         tell application "iBlog" to set catFolderName to the folder name of cat
         --display dialog catFolderName
         copy {catFolderName, -1} to the end of idTableRef
         tell application "iBlog" to set ents to the entries of cat
         repeat with ent in ents
            tell application "iBlog" to set episodeFolderName to the folder name of ent
            set episodePath to catFolderName & "/" & episodeFolderName
            copy {episodePath, postID} to the end of idTableRef
            set postID to postID + 1
         end repeat
      end repeat

Now it's possible to look up the id of any episode, and build the new link. The lookup code is in the attached script, and also handles the special cases of linking to a category page and to the main page. For category pages, I just hand-built a table of the category ID's I needed based on previous import tests.

Finally, there is the task of preserving the links to the old comment system. Happily, those Haloscan comments are also connected based on the file path of the episode. (Though it looks like really old comments are not accessible, anyway, which is a bummer.)

In the main loop, after the body text has been cleaned up, tack the link to Haloscan on the end, complete with hooks to allow CSS formatting:

Code: Select all
               set bod to bod & newLine & newLine & "<div class=\"jsOldCommentBlock\"><span>Legacy Comment System:</span> <a href=\"javascript:HaloScan('" & entFolder & "');\"><script type=\"text/javascript\">postCount('" & entFolder & "'); </script></a></div>"

Not mentioned above are functions for logging errors and a few other utililties that are in the main script file. They should be pretty obvious. The script includes code that is specific to issues I encountered, but it should be a good start for anyone who wants to export iBlog 2 data for import into another system. It SHOULD be safe to execute on your iBlog data; it doesn't change anything on the iBlog side of things. I don't know if there's anyone else in the world even using iBlog 2 anymore, but if you would like help with this script, let me know.
Full-time writer: Muddled Ramblings and Half-Baked Ideas
Part-time geek: Jer's Software Hut, home of Jer's Novel Writer
Jerry
Site Admin
 
Posts: 14
Joined: Wed Mar 11, 2009 6:47 pm
Location: Prague

Re: iBlog -> WordPress

Postby Jerry » Tue Mar 17, 2009 4:08 pm

If there is demand, I will make this article into more of a how-to. If only one or two people need it, it would probably be faster for me to just customize the script for them.
Full-time writer: Muddled Ramblings and Half-Baked Ideas
Part-time geek: Jer's Software Hut, home of Jer's Novel Writer
Jerry
Site Admin
 
Posts: 14
Joined: Wed Mar 11, 2009 6:47 pm
Location: Prague

The Actual Script and instructions.

Postby Jerry » Mon Mar 23, 2009 2:33 pm

Attached is a zip file containing the script. I've tried to move all the stuff that you need to adjust to a block at the top, so you can just tweak those values and run the script. Here's what you do:
  1. download the zip file
  2. open the script with Script Editor (you already have this)
  3. modify the six or so variables at the top (most are paths and url's)
  4. run the script - an import file is created for each category
  5. review errorlog.txt - probably won't be interesting
  6. import the first category (In WordPress, Admin->Tools->Import->WordPress)
  7. look at all the things that went wrong - although there's always a chance it will work first try...
  8. tweak the script or contact me here
  9. delete all your bad posts and start run the script again
  10. when everything looks right, party!

The categories are separate files because my blog exceeded the maximum import size for WordPress, if your blog is smaller but with lots of categories I could add an option in the script to make the export all one file.

Also note that if you have links to category pages within your blog episodes, you need to define those categories in WordPress before you run the script for the final time. If it's an issue for anyone, I'll write more about it.

Anyway, here it is! Feel free to ask questions.
export iBlog.zip
The latest verision of the script to extract iBlog data
(33.58 KiB) Downloaded 61 times
Full-time writer: Muddled Ramblings and Half-Baked Ideas
Part-time geek: Jer's Software Hut, home of Jer's Novel Writer
Jerry
Site Admin
 
Posts: 14
Joined: Wed Mar 11, 2009 6:47 pm
Location: Prague

Re: iBlog -> WordPress

Postby mailking » Mon Mar 23, 2009 3:13 pm

We travel, we go places, and I use iBlog2 at the moment, but I also have an old iBlog1 log online.
I want both the blogs to migrate to Drupal.
And I don't want to import all the iBlog1 logs first into iBlog2.
So I would like to export both logs into something that Drupal can read.
I found that small application agitprop, but I have to look into that...

I will copy this page later and reread the whole thing, no time now. this is a quick wifi job, getting my mail and out again. Maybe next week?
Coen
aka mailking
http://www.landcruising.nl
mailking
 
Posts: 1
Joined: Mon Mar 23, 2009 12:53 pm

Re: iBlog -> WordPress

Postby Jerry » Mon Mar 23, 2009 3:41 pm

I don't know if any of the drupal blogs will import WordPress WXR files or if you will need another format. I thought about using Drupal (I've used it for other things) but in the end chose WordPress for my blog. Drupal's pretty cool but was going to take way more coding to do some of the category-based things I like.

Agitprop will work for iBlog 1 but not 2; I have no idea if scripting support was in iBlog 1, which is what my system depends on. If there is scripting support, then it would be possible to modify my script to work on iBlog 1 as well. Since Agitprop already works, though, it doesn't seem like it's worth bothering.
Full-time writer: Muddled Ramblings and Half-Baked Ideas
Part-time geek: Jer's Software Hut, home of Jer's Novel Writer
Jerry
Site Admin
 
Posts: 14
Joined: Wed Mar 11, 2009 6:47 pm
Location: Prague


Return to Escape from iBlog

Who is online

Users browsing this forum: No registered users and 0 guests

cron