1
//! The Tor directory mirror implementation.
2
//!
3
//! # Specifications
4
//!
5
//! * [Directory cache operation](https://spec.torproject.org/dir-spec/directory-cache-operation.html).
6
//!
7
//! # Rationale
8
//!
9
//! The network documents specified in the directory specification form a
10
//! fundamental part within the Tor protocol, namely the creation and distribution
11
//! of a canonical list, listing all relays present in the Tor network, thereby
12
//! giving all clients a unified view of the entire Tor network, a fact that
13
//! is very important for defending against partitioning attacks and other potential
14
//! attacks in the domain of distributed networks.
15
//!
16
//! These network documents are generated, signed, and served by so called
17
//! "directory authorities", a set of 10-ish highly trusted Tor relays more or
18
//! less governing the entirety of the Tor network.
19
//!
20
//! Now here comes the bottleneck: Tor has millions of active daily users but
21
//! only 10-ish relays responsible for these crucial documents.  Having all
22
//! clients download from those 10-ish relays would present an immense overload
23
//! to those, thereby potentially shutting the entire Tor network down, if the
24
//! amount of traffic to those relays is so high, that they are unable to
25
//! communicate and coordinate under themselves.
26
//!
27
//! Fortunately, all network documents are either directly or indirectly signed
28
//! by well-known keys of directory authorities, thereby making mirroring them
29
//! trivially possible, due the fact that authenticity can be established outside
30
//! the raw TLS connection thanks to cryptographic signatures.
31
//!
32
//! This is the place where directory mirrors come in hnady.  Directory mirrors
33
//! (previously known as "directory caches") are ordinary relays that mirror all
34
//! network documents from the authorities, by implementing the respective routes
35
//! for all HTTP GET endpoints from the relays.
36
//!
37
//! The network documents are usually served through ordinary Tor circuits,
38
//! by accepting incoming connections through `RELAY_BEGIN_DIR` cells.
39
//! In the past, this was done by some relays optionally enabling an additional
40
//! socket on the ordinary Internet through a dedicated SocketAddr, known as
41
//! "directory address".  Since about 2020, this is no longer done.  However,
42
//! the functionality continues to persist and this module is written fairly
43
//! agnostic on how it accepts such connections, as directory authorities continue
44
//! to advertise their directory address.
45

            
46
use std::{convert::Infallible, path::PathBuf};
47

            
48
use futures::Stream;
49
use tokio::io::{AsyncRead, AsyncWrite};
50
use tor_dircommon::{
51
    authority::AuthorityContacts,
52
    config::{DirTolerance, DownloadScheduleConfig},
53
};
54

            
55
mod operation;
56

            
57
/// Core data type of a directory mirror.
58
///
59
/// # External Notes
60
///
61
/// This structure serves as the entrence point to the [`mirror`](crate::mirror)
62
/// API.  It represents an instance that is launchable using [`DirMirror::serve`].
63
/// Calling this method consumes the instance, as this is the common behavior
64
/// for objects representing server-like things, in order to not imply that this
65
/// instance serves as a mere configuration template only.
66
///
67
/// # Internal Notes
68
///
69
/// For now, this data structure only holds configuration options as an ad-hoc
70
/// replacement for a yet missing hypothetical `DirMirrorConfig` structure.
71
///
72
/// I assume that in the future, regardless of the configuration, this might also
73
/// hold other fields such as access to the database pool, etc.  The question
74
/// is whether this structure will be passed around with locking mechanisms
75
/// or will just be used as a way to extract configuration options initially
76
/// in the consuming function, which then applies further wrapping or not.
77
#[derive(Debug)]
78
#[non_exhaustive]
79
pub struct DirMirror {
80
    /// The [`PathBuf`] where the [`database`](crate::database) is located.
81
    path: PathBuf,
82
    /// The [`AuthorityContacts`] data structure for contacting authorities.
83
    authorities: AuthorityContacts,
84
    /// The [`DownloadScheduleConfig`] used for properly retrying downloads.
85
    schedule: DownloadScheduleConfig,
86
    /// The [`DirTolerance`] to tolerate clock skews.
87
    tolerance: DirTolerance,
88
}
89

            
90
impl DirMirror {
91
    /// Creates a new [`DirMirror`] with a given set of configuration options.
92
    ///
93
    /// # Parameters
94
    ///
95
    /// * `path`: The [`PathBuf`] where the database is located.
96
    /// * `authorities`: The [`AuthorityContacts`] data structure for contacting authorities.
97
    /// * `schedule`: The [`DownloadScheduleConfig`] used for properly retrying downloads.
98
    /// * `tolerance`: The [`DirTolerance`] to tolerate clock skews.
99
    ///
100
    /// # Notes
101
    ///
102
    /// **Beware of [`DirTolerance::default()`]!**, as the default values are
103
    /// inteded for clients, not directory mirrors.  Tolerances of several days
104
    /// are not recommened for directory mirrors.  Consider using something in
105
    /// the minute range instead, such as `60s`, which is what ctor uses.[^1]
106
    ///
107
    /// TODO DIRMIRROR: This is unacceptable for the actual release.  We **NEED**
108
    /// a proper way to configure this, such as with a `DirMirrorConfig` struct
109
    /// that can properly serialize from configuration files and such.  However,
110
    /// this task is not a trivial one and maybe one of the hardest parts of this
111
    /// entire development, as it would involve a radical change to many higher
112
    /// level crates.  The reason for this being, that we need a clean way to
113
    /// share "global" settings such as the list of authorities into various
114
    /// sub-configurations, such as the configuration for the directory mirror.
115
    /// We must not offer different configurations for the list of authorities
116
    /// for those different components, that would result in lots of boilerplate
117
    /// and potentially wrong execution given that those resources are affecting
118
    /// so many parts of the Tor protocol that a consistent view must be assumed
119
    /// in order to avoid surprising behavior.
120
    ///
121
    /// [^1]: <https://gitlab.torproject.org/tpo/core/tor/-/blob/0b20710/src/feature/nodelist/networkstatus.c#L1890>.
122
    pub fn new(
123
        path: PathBuf,
124
        authorities: AuthorityContacts,
125
        schedule: DownloadScheduleConfig,
126
        tolerance: DirTolerance,
127
    ) -> Self {
128
        Self {
129
            path,
130
            authorities,
131
            schedule,
132
            tolerance,
133
        }
134
    }
135

            
136
    /// Consumes the [`DirMirror`] by running endlessly in the current task.
137
    ///
138
    /// This method accepts a `listener`, which is a [`Stream`] yielding a
139
    /// [`Result`] in order to model a generic way of accepting incoming
140
    /// connections.  Think of `S` as the file descriptor you would call
141
    /// `accept(2)` upon if you were in C.  The idea behind this generic is,
142
    /// as outlined in the module documentation, that a [`DirMirror`] can
143
    /// handle incoming connections in multiple ways, such as by serving
144
    /// through an ordinary TCP socket or through a Tor circuit in combination
145
    /// with a `RELAY_BEGIN_DIR` cell.  How this is concretely done, is outside
146
    /// the scope of this crate; instead we provide the primitives making such
147
    /// flexibility possible.
148
    #[allow(clippy::unused_async)] // TODO
149
    pub async fn serve<S, T, E>(self, _listener: S) -> Result<(), Infallible>
150
    where
151
        S: Stream<Item = Result<T, E>> + Unpin,
152
        T: AsyncRead + AsyncWrite + Unpin + Send + 'static,
153
        E: std::error::Error,
154
    {
155
        todo!()
156
    }
157
}