One of the principles of SuperTXT is to enhance readability by both humans and machines in a simple text format. Much can be inferred by a machine by interpreting text, along with the interconnections with hyperlinking. This is what popular search engines have been doing for decades at a cost of complexity. What if we can put explicit semantic information into our documents so that simple tools can read it?
Many of the documents in this site contain a point underneath the top-level heading that looks like this.
# Heading * type: Guide
Notice that the point is right underneath the single top-level heading. Also, there's a colon in it. This is a semantic point for the document.
With this convention we now have a really simple way to find what documents are what type, and find documents of a particular type. A large number of tools will walk a file tree for a snippet of text. Here is the text that we can use to find all the types.
For better precision we can use a regular expression, which is also well supported, matching the beginning of a line and greedily matching until the end of the line.
^\* type: .*
If we use something like grep on this site we might get an output like this with an overview of the documents and their types.
whats-sshla.s.txt:* type: Article intro-to-ssc.s.txt:* type: Article specs/pathname2.s.txt:* type: Specification specs/command-reflection.s.txt:* type: Specification specs/show-output.s.txt:* type: Specification specs/supertxt.s.txt:* type: Specification specs/report-progress.s.txt:* type: Specification 00-intro.s.txt:* type: Introduction hosting.s.txt:* type: Guide browsing.s.txt:* type: Guide semantics.s.txt:* type: Guide start.s.txt:* type: GettingStartedGuide
Each line tells us something about the subject, which is the document in this case. But, we're looking at a particular aspect of the document, the predicate "type." Finally, the last bit of information is the value (object) of that information. These three pieces of information form a subject-predicate-object "triple" that gives us one statement of semantic meaning. This kind of semantics resembles a much simplified form of RDF (resource description framework).
Every bit of semantic information in SuperTXT has this semantic structure with a subject, predicate and object.
<subject> <predicate> <object>
Semantics about the document like the above examples the subject is the document itself. The predicate is the portion of the point before the colon character. The value is the remaining portion of the point. Semantic points cannot be placed further down in the document. They must exist before any other SuperTXT elements, such as paragraphs, quotes, preformatted, etc. This is done to help both the reader so that they can see details of the document immediately, and also to help tools to pull this information out quickly without readin the entire document.
Sometimes semantics don't have simple scalar values (ie. a simple string). The object of a triple might be a separate entity (ie. subject) than the document itself that holds its own semantic triples. This is where semantic headings come in. You can declare a heading in your document that has its own semantic points.
## (e1) Entity 1 * foo: bar
Here the triple looks something like this.
document.s.txt:@e1 foo bar
Notice that the anchor gave the entity an identifier e1, so that's the subject of this triple. The predicate is "foo" and the value is "bar." Notice that this entity is completely separate from the document. It can live in any document.
Sometimes the entity only makes sense in relation to a document or a higher-level entity (heading). In this case you can put the predicate into the heading, which indicates the relationship to the parent. Here is a contact list document contacts.s.txt.
# Contact List ## contact: * address: 1234 Bank St.
In this case there's actually two triples.
contacts.s.txt contact :_b1 :_b1 address 1234 Bank St.
The first triple indicates a relationship from the document to an un-named (blank node) entity with the contact predicate. The second triple fills in the address of this entity with an address predicate and a simple scalar object "1234 Bank St."
So far we have only looked at scalars with simple string datatypes. It is possible, and even preferable to assign specific data types to them where possible. Tools can provide richer experiences and capabilities if they know that a particular string is a number, or a date, or many other possible datatypes. For example, a DateTime might be presented so that it can be shown in a calendar widget, or suffixed with "x days ago," or "y years ago." Addresses might provide geolocation capabilities, such as distance from the current, or map displays. A datatype for a scalar value can be provided using a special notation.
## Sales Meeting * startDate: 2024-06-12T12:00:00Z^^^DateTime
With the extra "^^^" and string at the end the semantics indicate that the object "2024-06-12T12:00:00Z" is actually a date and time in the ISO-8601 format. Here's the semantic "triple" that's actually a quad with the extra information.
:_b1 startDate 2024-06-12T12:00:00Z DateTime
If a tool can recognize this data type it might be able to provide extra capabilities, such as searching for references to dates within a range. It can do this without detailed knowledge of the type of entity, or its schema. The ability for tools to operate on the sematic data without a full picture is very useful and makes things much simpler.
So far you've seen a variety of types, datatypes, and predicates. Do they need to be declared somewhere? The answer is no, they don't need to be. You can if you want. This is a nice feature of RDF-style semantics. The schema can come later after the data is analyzed, unlike a database where it must be defined before any data is stored.
If you did want to take a look at a particular entity, where do you look? SuperTXT has one built-in predicate called "type" that indicates the type of an entity. This is axiomatic. Other than that, you can look for that entity in the current document. There might be a heading devoted to it.
* foo: bar ## (foo) Foo predicate The foo predicate is used to indicate a bar. Only an object "bar" is permitted as the range of this predicate.
This predicate is declared later on in the document and has a short description of its purpose. It can be also defined more formally as part of an ontology because ontologies are just another collection of triples. This is left as an exercise to the reader.
There are places on the internet working towards common ontologies.
If you are using vocabulary from a particular ontology you can include it into your SuperTXT document, which will both provide a reference in your document and semantically include it. You don't need to explicitly include the [:@schema site] into your document since that entire vocabulary is implicitly included in any SuperTXT document.
So far we've only covered how to describe internal entities in a document. It can be useful to provide descriptions of external entities too where the subject of triples exist somewhere else. Why is this useful?
To describe an external entity you use a special kind of quotation followed immediately by a link line to the target document.
> # Study of dietary habits of walrus populations > * dateModified: 2019-09-30T15:55:00Z^^^DateTime => (walruses) sci.gov:papers/2019/study12345.s.txt Here we can see that the [:@walruses] are eating ...
Within the quote section you are free to include/exclude lines as needed for summarization and referencing the source material. Adding new semantic lines that don't exist in the destination document is the way to add semantics that it doesn't or couldn't include.
There are two ways to add semantic information to the current document: preformatted blocks, and includes. While SuperTXT is quite versatile, these two techniques provide a way to use the versatility of other formats. If a preformatted block adds entities, and semantics those are automatically included into the parent document. For example, you can add some triples using LTSV (Labeled Tab Separate Values) like this.
Here are some triples. ``` .l.tsv id:bob type:Person givenName:Bob familyName:Doe id:jane type:Person givenName:Jane familyName:Doe ```
The effect of this preformatted block is to add these triples to the current document in a compact tabular form.
:@bob type Person :@bob givenName Bob :@bob familyName Doe :@jane type Person :@jane givenName Jane :@jane familyName oe
The effect is the same when using includes, but the content is added in from the included file. This is recommended only when there's an amount of data that would make the file much more difficult to read.
SuperTXT offers a very light-weight set of semantics, but with some powerful capabilities approaching a similar level of expressiveness of technologies, such as HTTP, and RDF. Both humans and machines can take advantage of it for increased expressive power and also better tooling.
HAVE SOME FEEDBACK ON THIS DOCUMENT?
You can provide a conventional comment on this document.
ssh email@example.com ccmnt semantics.s.txt <<EOF suggestion: Here's my actionable suggestion. EOF