# SuperTXT Semantics

type: Guide
dateModified: 2023-04-25T20:16:42-04:00^^^DateTime

One of the principles of SuperTXT is to enhance readability by both humans and machines in a simple text format. Much can be inferred by a machine by interpreting text, along with the interconnections with hyperlinking. This is what popular search engines have been doing for decades at a cost of complexity. What if we can put explicit semantic information into our documents so that simple tools can read it?

## Document semantics

Many of the documents in this site contain a point underneath the top-level heading that looks like this.

.s.txt

# Heading
* type: Guide

Notice that the point is right underneath the single top-level heading. Also, there's a colon in it. This is a semantic point for the document.

With this convention we now have a really simple way to find what documents are what type, and find documents of a particular type. A large number of tools will walk a file tree for a snippet of text. Here is the text that we can use to find all the types.

.txt

* type:

For better precision we can use a regular expression, which is also well supported, matching the beginning of a line and greedily matching until the end of the line.

.txt

^\* type: .*

If we use something like grep on this site we might get an output like this with an overview of the documents and their types.

.txt

whats-sshla.s.txt:* type: Article
intro-to-ssc.s.txt:* type: Article
specs/pathname2.s.txt:* type: Specification
specs/command-reflection.s.txt:* type: Specification
specs/show-output.s.txt:* type: Specification
specs/supertxt.s.txt:* type: Specification
specs/report-progress.s.txt:* type: Specification
00-intro.s.txt:* type: Introduction
hosting.s.txt:* type: Guide
browsing.s.txt:* type: Guide
semantics.s.txt:* type: Guide
start.s.txt:* type: GettingStartedGuide

Each line tells us something about the subject, which is the document in this case. But, we're looking at a particular aspect of the document, the predicate "type." Finally, the last bit of information is the value (object) of that information. These three pieces of information form a subject-predicate-object "triple" that gives us one statement of semantic meaning. This kind of semantics resembles a much simplified form of RDF (resource description framework).

→ https://en.wikipedia.org/wiki/Resource_Description_Framework

Every bit of semantic information in SuperTXT has this semantic structure with a subject, predicate and object.

.txt

<subject>	<predicate>	 <object>

Semantics about the document like the above examples the subject is the document itself. The predicate is the portion of the point before the colon character. The value is the remaining portion of the point. Semantic points cannot be placed further down in the document. They must exist before any other SuperTXT elements, such as paragraphs, quotes, preformatted, etc. This is done to help both the reader so that they can see details of the document immediately, and also to help tools to pull this information out quickly without readin the entire document.

thought: The semantic points about a SuperTXT document can serve a similar purpose as HTTP headers, but directly visible to the user, which is one of our values.

## Semantic headings

Sometimes semantics don't have simple scalar values (ie. a simple string). The object of a triple might be a separate entity (ie. subject) than the document itself that holds its own semantic triples. This is where semantic headings come in. You can declare a heading in your document that has its own semantic points.

.s.txt

## (e1) Entity 1
* foo: bar

Here the triple looks something like this.

.txt

document.s.txt:@e1	foo	bar

Notice that the anchor gave the entity an identifier e1, so that's the subject of this triple. The predicate is "foo" and the value is "bar." Notice that this entity is completely separate from the document. It can live in any document.

Sometimes the entity only makes sense in relation to a document or a higher-level entity (heading). In this case you can put the predicate into the heading, which indicates the relationship to the parent. Here is a contact list document contacts.s.txt.

.s.txt

# Contact List

## contact:
* address: 1234 Bank St.

In this case there's actually two triples.

.txt

contacts.s.txt	contact	:_b1
:_b1	address	1234 Bank St.

The first triple indicates a relationship from the document to an un-named (blank node) entity with the contact predicate. The second triple fills in the address of this entity with an address predicate and a simple scalar object "1234 Bank St."

note: It is possible to have an entity that has both an anchor and a predicate in a heading.

## Scalar datatypes

So far we have only looked at scalars with simple string datatypes. It is possible, and even preferable to assign specific data types to them where possible. Tools can provide richer experiences and capabilities if they know that a particular string is a number, or a date, or many other possible datatypes. For example, a DateTime might be presented so that it can be shown in a calendar widget, or suffixed with "x days ago," or "y years ago." Addresses might provide geolocation capabilities, such as distance from the current, or map displays. A datatype for a scalar value can be provided using a special notation.

.s.txt

## Sales Meeting
* startDate: 2024-06-12T12:00:00Z^^^DateTime

With the extra "^^^" and string at the end the semantics indicate that the object "2024-06-12T12:00:00Z" is actually a date and time in the ISO-8601 format. Here's the semantic "triple" that's actually a quad with the extra information.

.txt

:_b1	startDate	2024-06-12T12:00:00Z	DateTime

If a tool can recognize this data type it might be able to provide extra capabilities, such as searching for references to dates within a range. It can do this without detailed knowledge of the type of entity, or its schema. The ability for tools to operate on the sematic data without a full picture is very useful and makes things much simpler.

## Show me the schemas

So far you've seen a variety of types, datatypes, and predicates. Do they need to be declared somewhere? The answer is no, they don't need to be. You can if you want. This is a nice feature of RDF-style semantics. The schema can come later after the data is analyzed, unlike a database where it must be defined before any data is stored.

If you did want to take a look at a particular entity, where do you look? SuperTXT has one built-in predicate called "type" that indicates the type of an entity. This is axiomatic. Other than that, you can look for that entity in the current document. There might be a heading devoted to it.

.s.txt

* foo: bar

## (foo) Foo predicate

The foo predicate is used to indicate a bar. Only an object "bar" is permitted as the range of this predicate.

This predicate is declared later on in the document and has a short description of its purpose. It can be also defined more formally as part of an ontology because ontologies are just another collection of triples. This is left as an exercise to the reader.

There are places on the internet working towards common ontologies.

→ (schema site) https://schema.org

note: SuperTXT relies heavily on the [:@schema site] for a variety of semantics for the site. It has a wide range of useful entities. If there's an undeclared entity somewhere be sure to check there for more information.

If you are using vocabulary from a particular ontology you can include it into your SuperTXT document, which will both provide a reference in your document and semantically include it. You don't need to explicitly include the [:@schema site] into your document since that entire vocabulary is implicitly included in any SuperTXT document.

## Applying semantic information to external entities

So far we've only covered how to describe internal entities in a document. It can be useful to provide descriptions of external entities too where the subject of triples exist somewhere else. Why is this useful?

Describing semantic information to a document that doesn't have it, or is limited by its file format
Describing the state of another document at the time that this one was written to serve as a reference point for an analysis

To describe an external entity you use a special kind of quotation followed immediately by a link line to the target document.

.s.txt

> # Study of dietary habits of walrus populations
> * dateModified: 2019-09-30T15:55:00Z^^^DateTime
=> (walruses) sci.gov:papers/2019/study12345.s.txt

Here we can see that the [:@walruses] are eating ...

Within the quote section you are free to include/exclude lines as needed for summarization and referencing the source material. Adding new semantic lines that don't exist in the destination document is the way to add semantics that it doesn't or couldn't include.

## Including semantic information in the current document

There are two ways to add semantic information to the current document: preformatted blocks, and includes. While SuperTXT is quite versatile, these two techniques provide a way to use the versatility of other formats. If a preformatted block adds entities, and semantics those are automatically included into the parent document. For example, you can add some triples using LTSV (Labeled Tab Separate Values) like this.

.s.txt

Here are some triples.

``` .l.tsv
id:bob	type:Person	givenName:Bob	familyName:Doe
id:jane	type:Person	givenName:Jane	familyName:Doe
```

The effect of this preformatted block is to add these triples to the current document in a compact tabular form.

.txt

:@bob	type	Person
:@bob	givenName	Bob
:@bob	familyName	Doe
:@jane	type	Person
:@jane	givenName	Jane
:@jane	familyName	oe

note: The id predicate is handled specially with LTSV to create an anchor for the row. For CSV files and other tabular file formats the first column is assumed to be the anchor for the row.

The effect is the same when using includes, but the content is added in from the included file. This is recommended only when there's an amount of data that would make the file much more difficult to read.

## Conclusions

SuperTXT offers a very light-weight set of semantics, but with some powerful capabilities approaching a similar level of expressiveness of technologies, such as HTTP, and RDF. Both humans and machines can take advantage of it for increased expressive power and also better tooling.

HAVE SOME FEEDBACK ON THIS DOCUMENT?

You can provide a conventional comment on this document.

.sh

ssh nobody@supertxt.net ccmnt semantics.s.txt <<EOF
suggestion: Here's my actionable suggestion.
EOF