Isn’t it frustrating when it’s nearly impossible to find information on moving data to new systems? This is one reason I shy away from recommending data systems that are not business-class… such as the Wiki server feature found in OS X Snow Leopard. I wanted to share this post due to my frustration of not being able to find an easy migration path away from this Wiki software and onto something more business-friendly. Hopefully it can help someone in a similar situation.
I’ve been working a lot lately in Microsoft Flow – an automation tool that works great for workflows and connecting different software packages to make them work together. Because Flow is designed for repetitive tasks, it seemed like it could be a candidate for this function even though you wouldn’t normally think of it for this purpose.
Here’s a little background on the native state of the Wiki, and what we’re dealing with:
- The raw files are stored on the Mac server in /Applications/Server.app/Contents/ServerRoot/usr/share/collabd
- Each subfolder.page is basically an individual post.
- The files in each of these folders that you’ll want to work with are the .plist files. You’d think that you’d want the .html files, but Apple doesn’t make it that easy for you. Looking at the .plist in a text editor we see that the .plist is where almost all of the post detail is stored in an HTML-like format.
- If the post included attachments such as files or images, they should also reside in this folder.
Let me start off with a couple of caveats before I go into the technical detail of how this will work. First, this is a basic migration solution. There is no native compatibility between the OS X Wiki and, well, pretty much anything else. This process at least allows us to grab some of the more useful information from the Wiki and take it to a system which will hopefully be a more future-proofed home; in this case, SharePoint Online. But because of this, we’re going to lose a lot of the features that were used on the Mac Wiki: user comments, inline pictures, and file attachments are some of the things that will not carryover (or at least nicely). If the Wiki was used as a knowledge base of sorts, which my example was, then this may be OK. The frequently-accessed KB articles can be easily mended by hand and the others can be handled by attrition.
OK, lets get to it. We’re going to start with a blank Flow. Now right off the bat you’re going to have some choices to make; the first of which being how to get Flow access to the .plist files to begin with. I chose to accomplish this with a OneDrive folder. This worked really well for a couple of reasons:
- Flow can access OneDrive for Business with a native connector, and you can trigger your Flow when it sees new files in the specified OneDrive folder. This is just straight convenience.
- If you have Flow you should have OneDrive, so why not?
Whatever you choose, one more catch with Flow that hung me up for a second was the file format. It didn’t want to try to read the data from a .plist file. So before you bring them into Flow, change the file extension to something less Apple-ish such as .txt or .html.
Once you have your file import method sorted out then we need to start pulling the data out of the .plist’s. I first identified the data fields that were most important to me that I needed to make sure pulled over into our new Wiki. For me, these were:
- Created Date
- UID (I’ll explain this one more later)
It’s important to understand the layout of the .plist files. Within the .plist each of the fields we want is encaspulated in <key></key> tags. Then, the content that corresponds to that field will immediately follow it within <string></string> tags. Within Flow we can use this to our advantage to use the particular key tags to determine where in the file to pull content from.
In Flow, I started with initializing a string variable for each chunk of content that I wanted to end up with:
For the Title, I used substring() to extract the data between certain points starting at the phrase “>title<” as shown below:
Basically all we’re doing here is calculating the position of the start of the data (we know where it starts because it begins with <key>title</key> and is immediately followed by the next content tag <key>tombstoned</key>. Notice in my expression above that I chose to use >title< and >tombstoned< rather than <key>title</key> and <key>tombstoned</key>. This is because I found that the substring() function in Flow did not seem to like the expression if it was built with these full tags, something with the extra special characters made it throw an error basically stating the there was zero-length content.
Luckily for us the date is in a format that is easy for Flow to consume. Since it does come over in UTC though, after we grab the content using substring() we send it to the ‘Convert Time Zone’ action to get it into our local time zone:
The actual body of the Wiki post posed a new issue for us; look closely and you’ll see that it is typical HTML, but uses character entities for the greater than/less than symbols: < for < and > for >. If you want to leave them you can, however for my purpose I took them out using a couple of quick expressions using the replace() function:
Now that we have our data, it’s time to start inserting it into HTML that we can usefully take wherever we want. I accomplished this by using a number of Compose actions that brought the content into the correct places between HTML tags:
Take note in the image above that I included the original page’s meta data as part of the HTML header. What this does for us is allow us to very quickly match one of our new Wiki pages with the old folder from OS X. This gives us the ability to go back and grab attachments, images, or the raw files again if needed.
Now that we have our HTML content all brought together, in order to build the file itself I once again turned to OneDrive to build a brand new file with the HTML content. I also saved my files as .aspx since I wanted to ultimately bring them into SharePoint: