Grcov report - parse2.rs

1

//! New netdoc parsing arrangements, with `derive`

2

//!

3

//! # Parsing principles

4

//!

5

//! A parseable network document is a type implementing [`NetdocParseable`].

6

//! usually via the

7

//! [`NetdocParseable` derive=deftly macro`](crate::derive_deftly_template_NetdocParseable).

8

//!

9

//! A document type is responsible for recognising its own heading item.

10

//! Its parser will also be told other of structural items that it should not consume.

11

//! The structural lines can then be used to pass control to the appropriate parser.

12

//!

13

//! A "structural item" is a netdoc item that is defines the structure of the document.

14

//! This includes the intro items for whole documents,

15

//! the items that introduce document sections

16

//! (which we model by treating the section as a sub-document)

17

//! and signature items (which introduce the signatures at the end of the document,

18

//! and after which no non-signature items may appear).

19

//!

20

//! # Ordering

21

//!

22

//! We don't always parse things into a sorted order.

23

//! Sorting will be done when assembling documents, before outputting.

24

// TODO we don't implement deriving output yet.

25

//!

26

//! # Types, and signature handling

27

//!

28

//! Most top-level network documents are signed somehow.

29

//! In this case there are three types:

30

//!

31

//!   * **`FooUnverified`**: a signed `Foo`, with its signatures, not yet verified.

32

//!     Implements [`NetdocUnverified`],

33

//!     typically by invoking the

34

//!     [`NetdocUParseablenverified` derive macro](crate::derive_deftly_template_NetdocParseableUnverified)

35

//!     on `Foo`.

36

//!

37

//!     Type-specific methods are provided for verification,

38

//!     to obtain a `Foo`.

39

//!

40

//!   * **`Foo`**: the body data for the document.

41

//!     This doesn't contain any signatures.

42

//!     Having one of these to play with means signatures have already been validated.

43

//!     Can be parsed as part of the signed document,

44

//!     via the `NetdocParseable` implementation on `FooUnverified`,

45

//!     and then obtained via `.verify_...` method(s) on `FooUnverified`,

46

//!

47

//!   * **`FooSignatures`**: the signatures for a `Foo`.

48

//!     Implements `NetdocParseableSignatures`, via

49

//!     [derive](crate::derive_deftly_template_NetdocParseableSignatures),

50

//!     with `#[deftly(netdoc(signatures))]`.

51

//!

52

//! # Relationship to tor_netdoc::parse

53

//!

54

//! This is a completely new parsing approach, based on different principles.

55

//! The key principle is the recognition of "structural keywords",

56

//! recursively within a parsing stack, via the p`NetdocParseable`] trait.

57

//!

58

//! This allows the parser to be derived.  We have type-driven parsing

59

//! of whole Documents, Items, and their Arguments and Objects,

60

//! including of their multiplicity.

61

//!

62

//! The different keyword handling means we can't use most of the existing lexer,

63

//! and need new item parsing API:

64

//!

65

//!  * [`NetdocParseable`] trait.

66

//!  * [`KeywordRef`] type.

67

//!  * [`ItemStream`], [`UnparsedItem`], [`ArgumentStream`], [`UnparsedObject`].

68

//!

69

//! The different error handling means we have our own error types.

70

//! (The crate's existing parse errors have information that we don't track,

71

//! and is also a portmanteau error for parsing, writing, and other functions.)

72

//!

73

//! Document signing is handled in a more abstract way.

74

//!

75

//! Some old netdoc constructs are not supported.

76

//! For example, the obsolete `opt` prefix on safe-to-ignore Items.

77

//! The parser may make different decisions about netdocs with anomalous item ordering.

78

79

#[doc(hidden)]

80

#[macro_use]

81

pub mod internal_prelude;

82

83

#[macro_use]

84

mod structural;

85

86

#[macro_use]

87

mod derive;

88

89

mod error;

90

mod impls;

91

pub mod keyword;

92

mod lex;

93

mod lines;

94

pub mod multiplicity;

95

mod signatures;

96

mod traits;

97

98

#[cfg(feature = "incomplete")]

99

pub mod poc;

100

101

use internal_prelude::*;

102

103

pub use error::{ArgumentError, ErrorProblem, ParseError, UnexpectedArgument, VerifyFailed};

104

pub use impls::times::NdaSystemTimeDeprecatedSyntax;

105

pub use keyword::KeywordRef;

106

pub use lex::{ArgumentStream, ItemStream, NoFurtherArguments, UnparsedItem, UnparsedObject};

107

pub use lines::{Lines, Peeked, StrExt};

108

pub use signatures::{

109

    HasUnverifiedParsedBody, NetdocParseableSignatures, NetdocUnverified, SignatureHashInputs,

110

    SignatureHashesAccumulator, SignatureItemParseable, SignaturesData, check_validity_time,

111

    check_validity_time_tolerance, sig_hashes,

112

};

113

pub use structural::{StopAt, StopPredicate};

114

pub use traits::{

115

    IsStructural, ItemArgumentParseable, ItemObjectParseable, ItemValueParseable, NetdocParseable,

116

    NetdocParseableFields,

117

};

118

119

#[doc(hidden)]

120

pub use derive::netdoc_parseable_derive_debug;

121

122

pub(crate) use internal_prelude::EP;

123

124

//---------- input ----------

125

126

/// Options for parsing

127

///

128

/// Specific document and type parsing methods may use these parameters

129

/// to control their parsing behaviour at run-time.

130

#[derive(educe::Educe, Debug, Clone)]

131

#[allow(clippy::manual_non_exhaustive)]

132

#[educe(Default)]

133

pub struct ParseOptions {

134

    /// Retain unknown values?

135

///

136

    /// Some field types, especially for flags fields, have the capability to retain

137

    /// unknown flags.  But, whereas known flags can be represented as single bits,

138

    /// representing unknown flags involves allocating and copying strings.

139

    /// Unless the document is to be reproduced, this is a waste of effort.

140

///

141

    /// Each document field type affected by this option should store the unknowns

142

    /// as `Unknown<HashSet<String>>` or similar.

143

///

144

    /// This feature should only be used where performance is important.

145

    /// For example, it is useful for types that appear in md consensus routerdescs,

146

    /// but less useful for types that appear only in a netstatus preamble.

147

///

148

    /// This is currently used for router flags.

149

    #[educe(Default(expression = "Unknown::new_discard()"))]

150

    pub retain_unknown_values: Unknown<()>,

151

152

    // Like `#[non_exhaustive]`, but doesn't prevent use of struct display syntax with `..`

153

    #[doc(hidden)]

154

    _private_non_exhaustive: (),

155

156

157

/// Input to a network document top-level parsing operation

158

pub struct ParseInput<'s> {

159

    /// The actual document text

160

    input: &'s str,

161

    /// Filename (for error reporting)

162

    file: &'s str,

163

    /// Parsing options

164

    options: ParseOptions,

165

166

167

impl<'s> ParseInput<'s> {

168

    /// Prepare to parse an input string

169

652

    pub fn new(input: &'s str, file: &'s str) -> Self {

170

652

        ParseInput {

171

652

            input,

172

652

            file,

173

652

            options: ParseOptions::default(),

174

652

175

652

176

177

178

//---------- parser ----------

179

180

/// Common code for `parse_netdoc` and `parse_netdoc_multiple`

181

///

182

/// Creates the `ItemStream`, calls `parse_completely`, and handles errors.

183

218

fn parse_internal<T, D: NetdocParseable>(

184

218

    input: &ParseInput<'_>,

185

218

    parse_completely: impl FnOnce(&mut ItemStream) -> Result<T, ErrorProblem>,

186

218

) -> Result<T, ParseError> {

187

218

    let mut items = ItemStream::new(input)?;

188

218

    parse_completely(&mut items).map_err(|problem| ParseError {

189

98

        problem,

190

98

        doctype: D::doctype_for_error(),

191

98

        file: input.file.to_owned(),

192

98

        lno: items.lno_for_error(),

193

98

        column: problem.column(),

194

98

})

195

218

196

197

/// Parse a network document - **toplevel entrypoint**

198

178

pub fn parse_netdoc<D: NetdocParseable>(input: &ParseInput<'_>) -> Result<D, ParseError> {

199

178

    parse_internal::<_, D>(input, |items| {

200

178

        let doc = D::from_items(items, StopAt(false))?;

201

80

        if let Some(_kw) = items.peek_keyword()? {

202

            return Err(EP::MultipleDocuments);

203

80

204

80

        Ok(doc)

205

178

})

206

178

207

208

/// Parse multiple concatenated network documents - **toplevel entrypoint**

209

20

pub fn parse_netdoc_multiple<D: NetdocParseable>(

210

20

    input: &ParseInput<'_>,

211

20

) -> Result<Vec<D>, ParseError> {

212

20

    parse_internal::<_, D>(input, |items| {

213

20

        let mut docs = vec![];

214

60

        while items.peek_keyword()?.is_some() {

215

40

            let doc = D::from_items(items, StopAt(false))?;

216

40

            docs.push(doc);

217

218

20

        Ok(docs)

219

20

})

220

20

221

222

/// Parse multiple network documents, also returning their offsets  - **toplevel entrypoint**

223

///

224

/// Each returned document is accompanied by the byte offsets of its start and end.

225

///

226

/// (The netdoc metaformat does not allow anything in between subsequent documents in a file,

227

/// so the end of one document is the start of the next.)

228

///

229

/// This returns byte offsets rather than string slices,

230

/// because the caller can always convert the offsets into string slices,

231

/// but it is not straightforward to convert string slices borrowed from some input string

232

/// into offsets, in a way that is obviously correct without nightly `str::substr_range`.

233

///

234

/// Interfacing code can assume that slicing the input string with the returned

235

/// [`usize`] values will not cause an out-of-bounds error, meaning runtime

236

/// checks are not necessary there.

237

20

pub fn parse_netdoc_multiple_with_offsets<D: NetdocParseable>(

238

20

    input: &ParseInput<'_>,

239

20

) -> Result<Vec<(D, usize, usize)>, ParseError> {

240

20

    parse_internal::<_, D>(input, |items| {

241

20

        let mut docs = vec![];

242

74

        while items.peek_keyword()?.is_some() {

243

54

            let start_pos = items.byte_position();

244

54

            let doc = D::from_items(items, StopAt(false))?;

245

54

            let end_pos = items.byte_position();

246

247

            // Check start_pos and end_pos are in range.

248

54

            if input.input.get(start_pos..end_pos).is_none() {

249

                return Err(ErrorProblem::Internal("out-of-bounds bug?"));

250

54

251

252

54

            docs.push((doc, start_pos, end_pos));

253

254

20

        Ok(docs)

255

20

})

256

20