# Pathname2

type: Specification
version: 0.0.1
dateModified: 2023-04-23T15:38:16-04:00^^^DateTime

Unix pathnames form the fundamental reference technique for files in the file system. Considering that many things are represented as files in the system the paths are used for a great many things. While they work very well, they can't address things that are not represented as files in the filesystem, such as files on other servers.

Pathname2 addresses both local and remote files accessible over the internet. Some tools such as [:@SSHLA] have already begun to use network paths, but without a written specification. This document attempts to specify these new pathnames so that they can be used more widely.

## Format

.txt

[[user@]hostname[^port]:][pathname]

User (optional, requires hostname): Represents the remote username, otherwise user is SSH configuration defined
Hostname (optional): Address of the remote server in either DNS form, or IP address form
Port (optional, requires hostname): Port number of the SSH server, default is port 22
Pathname (optional): Standard Unix pathname according to the specifications, default is current working directory

note: A valid pathname2 must not be an empty string

Any valid pathname is also a valid pathname2, except if it contains a colon, where it must be escaped with a backslash. The presence of a ":" preceeded by a valid hostname is what distinguishes a pathname2 from a plain pathname.

A port is expected to be entirely numeric, and a hostname cannot contain a colon, but if the user has a colon then it can be escaped with a backslash. For example, this is a valid pathname2:

.txt

my\:user@server1:abc.txt

## Path arithmetic

Since paths can be combined with relative paths, and other operations performed on them it is important to cover what happens in the common cases. Consider the following directory path that is joined with a simple filename path to address a file on a remote server.

.txt

server1:specs <join> electrical-code.txt = server1:specs/electrical-code.txt

Similarly, a directory and path can be joined to a remote directory.

.txt

server1:specs <join> electrical/wi-code.txt = server1:specs/electrical/wi-code.txt

But also, a pathname2 without a pathname can be joined with a pathname (relative to the remote working directory).

.txt

server1: <join> poetry/mypoem.txt = server1:poetry/mypoem.txt

Other types of relative paths work virtually the same way as with Unix pathnames, including relative paths with "." and "..". It's just that the hostname portion remains the same throughout the operation.

note: Joining a pathname2 with a pathname cannot change the hostname portion.

If a pathname2 is joined with another pathname2 with a hostname portion the result is the second pathname2. There is no way to mutate portions of the hostname portion, or any sort of relative operation.

.txt

server1:foo.txt <join> poetry-server:limerick55.txt = poetry-server:limerick55.txt

## Server relative vs. absolute paths

There is a distinction between relative and absolute paths in both the original pathnames and now in this specification. In its original form relative paths are resolved in the local session based on the current working directory. Remote paths that are outside of the session are based on the default current working directory that will happen on the remote host when a connection is made. Typically, this is the home directory of the remote account, which can make remote relative paths also relative to the user's own account. For account independent access, using a shared "git" or "nobody" account, the relative paths can have a shared service-level significance, making them more sharable, like a hyperlink.

.txt

user@somesite.com:mygitrepo # This can be a different repo depending on the user

Remote absolute paths, like the local ones, have a more infrastructure-level significance, referring to more internal aspects of a server. Server and service administrators make use of these kinds of paths since they are more likely to be independent of the user account.

.txt

user1@myserver:/etc/server.conf # These two paths refer to the same file on the server
user2@myserver:/etc/server.conf

note: Depending on the level of abstraction of the service, and the trust of the users, absolute paths can be blocked entirely by the service using checks on the presence of a starting "/", or the presence of ".." segments.

## Spaces and special characters

Since pathname2 inherits from pathname it also inherits the character encoding independence except for the certain reserved characters: /, ., :, @, ^. The recommendation remains that all encoding should be in UTF-8 (strongly preferred), or other ASCII compatible encodings. This is so that path comprehension code can scan bytes directly for these characters. It also frees pathnames, and therefore file/directory names, to be flexible with all unicode content available while retaining readability. Here are some interesting path examples.

.txt

my trip to Ayutthaya ✈️.txt
cats.com:cats playing in fountains ⛲.s.txt

The pathnames themselves can contain characters that are sensitive to certain shells. Unix shell reserved characters are commonly escaped with a backslash "\" or the path can be surrounded by quotes, which is preferred since it is more readable.

## Comparison to URL

Pathname2 should support all of the most commonly used capabilities of URL's, such as server specifier, username, and ports. Noticeably missing is the protocol portion, which is effectively replaced by both the [:@SSHLA] system where the command determines the protocol, and everything is running over SSH.

The pathname portion of the pathname2 is equivalent to URL paths, except they provide the distinction between absolute (infrastructure) and relative (service-oriented) paths, which means that the same path system can be used by both teams managing the service, as well as the end-users of the service.

.txt

http://some-server.com/path/to/resource.txt # All paths are relative, never absolute
some-server.com:path/to/resource.txt # Relative path
some-server.com:/path/to/resource.txt # Absolute path, accessible to some, usually addresses the same file

Query parameters, are intentionally out of scope for this specification since in this space they are more commonly specified by the command, which has a much more flexible way of determining their own query syntax. Here's an abstract comparison with the "srch" command.

.txt

http://some-server.com/query/my/service?q=this+is+my+query

srch -C some-server.com: <<EOF
this is my query
EOF

You will notice that the search command doesn't have to encode the spaces since it makes use of the "heredoc" capability of the Unix shell, reading the query in plain text from standard input.

URL fragments are also out of scope for this specification since the fragments are interpreted by the viewer or shell, not processed by the command, or service. When running multiple commands through a pipe, or other joined together, the fragment applies to the combined output, and not an individual path. There is a common convention for encoding fragment-link position specifiers using specially formatted arguments after a shell comment like this:

.txt

find . -name "foo.txt"  # [:1] Shell/viewer might highlight the first line of the output here if it supports the convention
cats "space bar.s.txt"  # [:@list_of_drinks] This might bring you directly to this named anchor in the document
curl http://myspace.com # "[:/blink tags .* harmful/]" Bring me to the first occurence of text that matches this regular expression

HAVE SOME FEEDBACK ON THIS DOCUMENT?

You can provide a conventional comment on this document.

.sh

ssh nobody@supertxt.net ccmnt specs/pathname2.s.txt <<EOF
suggestion: Here's my actionable suggestion.
EOF