Publitz: An Approach to Publishing

tzbitsby
Jason Kantz
Sep 2018
 vol 1.

Publitz is a document format and publishing system for sharing documents on the Internet. The major parts of publitz include

  1. an HTML template designed for "published" documents
  2. a plaintext "markaround" language
  3. APIs for interacting with documents, writers, and other readers

This document outlines some design goals for publitz.

Documents Are Self-contained

Once you have the publitz HTML file, no other network requests are required to view the document. This means there's only a single download of a publitz HTML file, and it works offline.

It also means publitz documents can be verified. There are two parts. The known HTML template which forms an 'envelope' and the inner content. The envelope contains a SHA-1 hash and a SHA-2 hash of the inner content. The shorter SHA-1 can be used as a key to identify the document. It can be used as a key in a database

or used as a value that is digitally signed by an author. Both can be used to verify the document contents have not been changed.

$ publitz verify mydoc.htm
c431cd262a5101db09928df23d6ff3f55ca7927a mydoc.htm OK

The verification can be taken further to ensure that the document content only contains an allowed subset of html tags. Then if the content passes validation, and the envelope matches a declared version of publitz, the viewer of the document has reasonable assurance that there is no additional, unknown code executed when viewing the document.

Screen Matches Print

The publishing step breaks the text into pages. An anchor link for the page can be added when viewing the HTML file. This allows specific pages to be cited within a long document. For example:

https://kantz.com/htm/watz.htm#p3

Page 3 on the screen is page 3 in the printed document. This means a citation remains valid regardless of the way the document is presented: online, physical printed document, print to PDF, etc.

Screen Approximates a Book

A visual page indicator at the top of the screen gives a visual

estimate of the length of the document, similar to how the thickness of a book indicates its length. The page indicator allows clicking between pages, which makes it easy to binary search through a document. A "print mode" displays the entire document on the screen at once for printing. Print mode also makes it easier to search for text across the whole document using the browser's built in search tool.

Publitz Style

Similar to how APA is a publishing style for the social sciences or MLA is a style for the humanities, it would be a good outcome for publitz if it settled into being a format used by generalists to publish their ideas for other generalists. Publitz should work well for publishing both fiction and non-fiction. Publitz is all about the writing, and with the consistency of the format, the particular details of it should fade into the background.

  • Publitz uses HTML to be most widely accessible.
  • The publitz template uses a widely available font for consistent appearance and text wrapping.
  • The font needs to be open and metric-compatible with Times New Roman, e.g., Liberation Serif.
  • Publitz needs conventions for citing sources, for formatting "front matter", and for being "readable" on the screen.

Advertisements

Since publitz documents are self contained, writers will not be able to put dynamic ads in their documents. A better approach with publitz, is to put advertisements on index pages with links to publitz documents. This is a more appropriate place for advertisements anyway, since it locates them at a place where the reader is deciding what to read next.

Markaround

The plaintext format for publitz is a "markaround" language instead of "markdown" (Gruber, 2004).

The plaintext has single line commands that change the mode of the document, so for example, a centered paragraph followed by two normal paragraphs looks like:

 #:p.center
 
 This is centered

 #:p
 
 This is not centered.

 And this is a third paragraph that is also not centered
 because the mode is sticky.

The document easily converts to plaintext with

 grep -v -e "^#"
 

Hyperlinks

The plaintext converts explicit URIs that begin with "http" into hyperlinks. The syntax intentionally omits support for implicit hyperlinks. This is to incentivize having a "References" section. Some reasons for this decision:

  • Implicit links make it too easy to insert a link, without a corresponding entry on the "References" page.
  • Implicit links assume links work forever.
  • Explicit links encourage writers to collect, think about, and give respect to the sources being used.

This decision means that publitz will not work well for a collection of documents used as reference material. Implicit links among reference docs are often very useful when trying to find some specific bit of information without wanting to read any one document entirely.

TODO: Syntax for linking to specific pages /within/ a document.

Extending

Making the plaintext format line oriented sets it up for various

kinds of processing with a "ptz" command.

Consider the following headings in "mydoc.ptz":

 #:h1
 Book

 #:h2
 Chapter

 #:h3
 Section

They can be numbered/renumbered with:

$ ptz renumber mydoc.ptz
 #:h1
 1 Book

 #:h2
 1.1 Chapter

 #:h3
 1.1.1 Section
$

Suppose for a moment these sections are on pages 3, 4, and 21 respectively. A table of contents can be generated,

 $ ptz toc mydoc.ptz > toc.ptz
 $ cat toc.ptz
 #:toc
 #:p3 1 Book
 #:p4 1.1 Chapter
 #:p21 1.1.1 Section

Updates can be coordinated in a Makefile and the original doc includes the toc with

 #:include toc.ptz

Distribution and reader/writer interaction

This is the area that's still under consideration. Here are some initial ideas.

  • Authors publish their document by uploading it to a service that keys the document by the inner document hash.
  • Since the file is self-contained, authors might digitally sign the hash.
  • The service allows for revisions, one hash/document might replace another
  • How to lead reader to latest document revision/edition?
  • References section: cite publitz documents with a URI like ptz://c431cd2...
  • Structured references makes it easy to build out "most cited" metrics
  • Maybe implement this as a dApp and uploads go to swarm.

If have suggestions or want to collaborate, please reach out via email: mailto:jason@kantz.com

References

American Psychological Association. (2009). Publication manual of the APA, 6 ed.

Eastlake, D., Hansen, T. (2011) US secure hash algorithms. Internet Engineering Task Force Request for Comments 6234. AT&T Labs: May 2011. Retrieved from https://ietf.org/rfc/rfc6234.txt

Faulkner, S. Eicholz, A., Leithead, T., Danilo, A., Moon, S. (2017). HTML 5.2 W3C Recommendation. Retrieved from https://www.w3.org/TR/2017/REC-html52-20171214/

Gruber, J. (2004). Markdown [computer software]. Philadelphia, PA. Available from https://daringfireball.net/projects/markdown/

The Modern Language Association of America. (2016). MLA handbook, 8 ed.

Trón, V., Elad, Destinatis, Aron. (2018). Welcome to the Swarm Documentation! Retrieved from https://swarm-guide.readthedocs.io/en/latest/index.html

Webbink, M. (2007) Liberation Fonts. Red Hat Blog. Retrieved from https://www.redhat.com/en/blog/liberation-fonts