summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorakrasner <>2015-04-28 15:29:00 (GMT)
committerhdiff <hdiff@hdiff.luite.com>2015-04-28 15:29:00 (GMT)
commit9cf173e5897f3fe856ea4b135ed71c48e08bc411 (patch)
tree3254a8585541c51125934c953ffa30d62d39c3f6
version 0.1.0.0HEAD0.1.0.0master
-rw-r--r--AUTHORS1
-rw-r--r--COPYING121
-rw-r--r--ChangeLog17
-rw-r--r--INSTALL13
-rw-r--r--NEWS18
-rw-r--r--README20
-rw-r--r--Setup.hs2
-rw-r--r--src/Data/Position.hs24
-rw-r--r--src/Data/Position/Interface.hs352
-rw-r--r--src/Data/Position/Types.hs55
-rw-r--r--test/test.hs153
-rw-r--r--text-position.cabal43
12 files changed, 819 insertions, 0 deletions
diff --git a/AUTHORS b/AUTHORS
new file mode 100644
index 0000000..d7d6562
--- /dev/null
+++ b/AUTHORS
@@ -0,0 +1 @@
+fr33domlover <fr33domlover@riseup.net>
diff --git a/COPYING b/COPYING
new file mode 100644
index 0000000..0e259d4
--- /dev/null
+++ b/COPYING
@@ -0,0 +1,121 @@
+Creative Commons Legal Code
+
+CC0 1.0 Universal
+
+ CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE
+ LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN
+ ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS
+ INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES
+ REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS
+ PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM
+ THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED
+ HEREUNDER.
+
+Statement of Purpose
+
+The laws of most jurisdictions throughout the world automatically confer
+exclusive Copyright and Related Rights (defined below) upon the creator
+and subsequent owner(s) (each and all, an "owner") of an original work of
+authorship and/or a database (each, a "Work").
+
+Certain owners wish to permanently relinquish those rights to a Work for
+the purpose of contributing to a commons of creative, cultural and
+scientific works ("Commons") that the public can reliably and without fear
+of later claims of infringement build upon, modify, incorporate in other
+works, reuse and redistribute as freely as possible in any form whatsoever
+and for any purposes, including without limitation commercial purposes.
+These owners may contribute to the Commons to promote the ideal of a free
+culture and the further production of creative, cultural and scientific
+works, or to gain reputation or greater distribution for their Work in
+part through the use and efforts of others.
+
+For these and/or other purposes and motivations, and without any
+expectation of additional consideration or compensation, the person
+associating CC0 with a Work (the "Affirmer"), to the extent that he or she
+is an owner of Copyright and Related Rights in the Work, voluntarily
+elects to apply CC0 to the Work and publicly distribute the Work under its
+terms, with knowledge of his or her Copyright and Related Rights in the
+Work and the meaning and intended legal effect of CC0 on those rights.
+
+1. Copyright and Related Rights. A Work made available under CC0 may be
+protected by copyright and related or neighboring rights ("Copyright and
+Related Rights"). Copyright and Related Rights include, but are not
+limited to, the following:
+
+ i. the right to reproduce, adapt, distribute, perform, display,
+ communicate, and translate a Work;
+ ii. moral rights retained by the original author(s) and/or performer(s);
+iii. publicity and privacy rights pertaining to a person's image or
+ likeness depicted in a Work;
+ iv. rights protecting against unfair competition in regards to a Work,
+ subject to the limitations in paragraph 4(a), below;
+ v. rights protecting the extraction, dissemination, use and reuse of data
+ in a Work;
+ vi. database rights (such as those arising under Directive 96/9/EC of the
+ European Parliament and of the Council of 11 March 1996 on the legal
+ protection of databases, and under any national implementation
+ thereof, including any amended or successor version of such
+ directive); and
+vii. other similar, equivalent or corresponding rights throughout the
+ world based on applicable law or treaty, and any national
+ implementations thereof.
+
+2. Waiver. To the greatest extent permitted by, but not in contravention
+of, applicable law, Affirmer hereby overtly, fully, permanently,
+irrevocably and unconditionally waives, abandons, and surrenders all of
+Affirmer's Copyright and Related Rights and associated claims and causes
+of action, whether now known or unknown (including existing as well as
+future claims and causes of action), in the Work (i) in all territories
+worldwide, (ii) for the maximum duration provided by applicable law or
+treaty (including future time extensions), (iii) in any current or future
+medium and for any number of copies, and (iv) for any purpose whatsoever,
+including without limitation commercial, advertising or promotional
+purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each
+member of the public at large and to the detriment of Affirmer's heirs and
+successors, fully intending that such Waiver shall not be subject to
+revocation, rescission, cancellation, termination, or any other legal or
+equitable action to disrupt the quiet enjoyment of the Work by the public
+as contemplated by Affirmer's express Statement of Purpose.
+
+3. Public License Fallback. Should any part of the Waiver for any reason
+be judged legally invalid or ineffective under applicable law, then the
+Waiver shall be preserved to the maximum extent permitted taking into
+account Affirmer's express Statement of Purpose. In addition, to the
+extent the Waiver is so judged Affirmer hereby grants to each affected
+person a royalty-free, non transferable, non sublicensable, non exclusive,
+irrevocable and unconditional license to exercise Affirmer's Copyright and
+Related Rights in the Work (i) in all territories worldwide, (ii) for the
+maximum duration provided by applicable law or treaty (including future
+time extensions), (iii) in any current or future medium and for any number
+of copies, and (iv) for any purpose whatsoever, including without
+limitation commercial, advertising or promotional purposes (the
+"License"). The License shall be deemed effective as of the date CC0 was
+applied by Affirmer to the Work. Should any part of the License for any
+reason be judged legally invalid or ineffective under applicable law, such
+partial invalidity or ineffectiveness shall not invalidate the remainder
+of the License, and in such case Affirmer hereby affirms that he or she
+will not (i) exercise any of his or her remaining Copyright and Related
+Rights in the Work or (ii) assert any associated claims and causes of
+action with respect to the Work, in either case contrary to Affirmer's
+express Statement of Purpose.
+
+4. Limitations and Disclaimers.
+
+ a. No trademark or patent rights held by Affirmer are waived, abandoned,
+ surrendered, licensed or otherwise affected by this document.
+ b. Affirmer offers the Work as-is and makes no representations or
+ warranties of any kind concerning the Work, express, implied,
+ statutory or otherwise, including without limitation warranties of
+ title, merchantability, fitness for a particular purpose, non
+ infringement, or the absence of latent or other defects, accuracy, or
+ the present or absence of errors, whether or not discoverable, all to
+ the greatest extent permissible under applicable law.
+ c. Affirmer disclaims responsibility for clearing rights of other persons
+ that may apply to the Work or any use thereof, including without
+ limitation any person's Copyright and Related Rights in the Work.
+ Further, Affirmer disclaims responsibility for obtaining any necessary
+ consents, permissions or other rights required for any use of the
+ Work.
+ d. Affirmer understands and acknowledges that Creative Commons is not a
+ party to this document and has no duty or obligation with respect to
+ this CC0 or use of the Work.
diff --git a/ChangeLog b/ChangeLog
new file mode 100644
index 0000000..711cad7
--- /dev/null
+++ b/ChangeLog
@@ -0,0 +1,17 @@
+The changes are recorded by the version control system, Darcs. To see a log
+quickly from the terminal, run:
+
+ $ darcs changes --repo http://darcs.rel4tion.org/repos/text-position
+
+There is also a web interface at <http://darcs.rel4tion.org> which, among other
+things, can display the history log.
+
+To see the log in a local clone, first get a copy of the repository if you
+haven't yet:
+
+ $ darcs get http://darcs.rel4tion.org/repos/text-position
+
+Then move into the newly created directory and run darcs:
+
+ $ cd text-position
+ $ darcs changes
diff --git a/INSTALL b/INSTALL
new file mode 100644
index 0000000..7285e9a
--- /dev/null
+++ b/INSTALL
@@ -0,0 +1,13 @@
+Install from Hackage:
+
+ $ cabal install text-position
+
+Install from unpacked release tarball or source repo:
+
+ $ cd text-position
+ $ cabal install
+
+Just play with it without installing:
+
+ $ cabal build
+ $ cabal repl
diff --git a/NEWS b/NEWS
new file mode 100644
index 0000000..094f648
--- /dev/null
+++ b/NEWS
@@ -0,0 +1,18 @@
+text-position 0.1.0.0 -- (unreleased)
+======================================
+
+General, build and documentation changes:
+
+* (This is the first release, so everything is new)
+
+New APIs, features and enhancements:
+
+* (This is the first release, so everything is a new feature)
+
+Bug fixes:
+
+* (This is the first release and there's no bug tracking yet)
+
+Dependency changes:
+
+* (No dependencies at this point)
diff --git a/README b/README
new file mode 100644
index 0000000..d119890
--- /dev/null
+++ b/README
@@ -0,0 +1,20 @@
+This is a small Haskell library for tagging plain text and token streams with
+positions - line, column and character numbers. These are useful for locating
+lexer errors, debugging and more.
+
+See the .cabal file for more info and link to project website the version
+control.
+
+The official download locations are:
+
+- The project website
+- Torrents
+- Hackage
+
+This library is free software, and is committed to software freedom. It is
+released to the public domain using the CC0 Public Domain Dedication. For the
+boring "legal" details see the file 'COPYING'.
+
+See the file 'INSTALL' for hints on installation. The file 'ChangeLog' explains
+how to see the history log of the changes done in the code. 'NEWS' provides a
+friendly overview of the changes for each release.
diff --git a/Setup.hs b/Setup.hs
new file mode 100644
index 0000000..9a994af
--- /dev/null
+++ b/Setup.hs
@@ -0,0 +1,2 @@
+import Distribution.Simple
+main = defaultMain
diff --git a/src/Data/Position.hs b/src/Data/Position.hs
new file mode 100644
index 0000000..770d899
--- /dev/null
+++ b/src/Data/Position.hs
@@ -0,0 +1,24 @@
+{- This file is part of text-position.
+ -
+ - Written in 2015 by fr33domlover <fr33domlover@riseup.net>.
+ -
+ - ♡ Copying is an act of love. Please copy, reuse and share.
+ -
+ - To the extent possible under law, the author(s) have dedicated all copyright
+ - and related and neighboring rights to this software to the public domain
+ - worldwide. This software is distributed without any warranty.
+ -
+ - You should have received a copy of the CC0 Public Domain Dedication along
+ - with this software. If not, see
+ - <http://creativecommons.org/publicdomain/zero/1.0/>.
+ -}
+
+module Data.Position (
+ -- * Types
+ module Data.Position.Types
+ , module Data.Position.Interface
+ )
+where
+
+import Data.Position.Types
+import Data.Position.Interface
diff --git a/src/Data/Position/Interface.hs b/src/Data/Position/Interface.hs
new file mode 100644
index 0000000..8be0cc4
--- /dev/null
+++ b/src/Data/Position/Interface.hs
@@ -0,0 +1,352 @@
+{- This file is part of text-position.
+ -
+ - Written in 2015 by fr33domlover <fr33domlover@riseup.net>.
+ -
+ - ♡ Copying is an act of love. Please copy, reuse and share.
+ -
+ - To the extent possible under law, the author(s) have dedicated all copyright
+ - and related and neighboring rights to this software to the public domain
+ - worldwide. This software is distributed without any warranty.
+ -
+ - You should have received a copy of the CC0 Public Domain Dedication along
+ - with this software. If not, see
+ - <http://creativecommons.org/publicdomain/zero/1.0/>.
+ -}
+
+module Data.Position.Interface (
+ -- * Special Positions
+ zeroPosition
+ , firstPosition
+ -- * Special Advances
+ , emptyAdvance
+ , defaultAdvance
+ -- * Creating Advances
+ , psymAdvance
+ , symAdvance
+ , linecharAdvance
+ , stringAdvance
+ , newlineAdvance
+ , commonAdvance
+ , (<++>)
+ -- * Applying Advances
+ , tryAdvance
+ , tryAdvanceC
+ , advance
+ , advanceC
+ -- * Utilities Based on Advances
+ , defaultAnnotate
+ , enrichOnce
+ , enrichOnceD
+ , enrich
+ , enrichD
+ , bless
+ , tokens
+ , textInfo
+ )
+where
+
+import Data.List (isPrefixOf, mapAccumL)
+import Data.Maybe (fromJust, maybe)
+import Data.Position.Types
+import Text.Regex.Applicative
+
+-------------------------------------------------------------------------------
+-- Special Positions
+-------------------------------------------------------------------------------
+
+-- | The position before the first character in a file, to be used as an
+-- initial value before reading actual characters.
+zeroPosition :: Position
+zeroPosition = Position 1 0 0
+
+-- | The position of the first character in a file.
+firstPosition :: Position
+firstPosition = Position 1 1 1
+
+-------------------------------------------------------------------------------
+-- Special Advances
+-------------------------------------------------------------------------------
+
+-- | The zero advance. It doesn't match any input and doesn't consume any
+-- characters. Applying it doesn't change the position.
+emptyAdvance :: Advance s
+emptyAdvance = empty
+
+-- | The default advance when reading a character, e.g. a letter or a digit.
+-- The new character would have column number higher by 1, and character index
+-- higher by once (advances by 1 for each character read). The pattern accepts
+-- any single character.
+defaultAdvance :: Advance s
+defaultAdvance = f <$ anySym
+ where
+ f (Position l c ch) = Position l (c + 1) (ch + 1)
+
+-------------------------------------------------------------------------------
+-- Creating Advances
+-------------------------------------------------------------------------------
+
+-- | Create an advance for a single character based on a predicate.
+psymAdvance :: (s -> Bool) -> (Position -> Position) -> Advance s
+psymAdvance p a = a <$ psym p
+
+-- | Create an advance for the given character.
+symAdvance :: Eq s => s -> (Position -> Position) -> Advance s
+symAdvance c = psymAdvance (c ==)
+
+-- | Create an advance for a line character with the specified width. This is
+-- mainly useful for tabs and perhaps the various space characters in Unicode.
+-- Example for tab:
+--
+-- > tabAdv = linecharAdvance '\t' 8
+linecharAdvance :: Eq s
+ => s -- ^ The character
+ -> Int -- ^ How many columns the character takes
+ -> Advance s
+linecharAdvance c width = symAdvance c f
+ where
+ f (Position l c ch) = Position l (c + width) (ch + 1)
+
+-- | Create an advance for the given character sequence.
+stringAdvance :: Eq s => [s] -> (Position -> Position) -> Advance s
+stringAdvance s a = a <$ string s
+
+-- | Create an advance for a character or sequence of characters expressing a
+-- newline, i.e. starting a new line. As the advance expresses the position
+-- /after/ the character, applying the advance results with a position at
+-- column 1.
+newlineAdvance :: Eq s => [s] -> Advance s
+newlineAdvance s = stringAdvance s f
+ where
+ f (Position l c ch) = Position (l + 1) 1 (ch + length s)
+
+-- | Create a set of common advances supporting tabs and newlines. More
+-- advances can easily be added by @<|>@ing them to the result. The result
+-- doesn't include the default advance.
+commonAdvance :: Int -- ^ Tab width (usually 2, 4 or 8)
+ -> Bool -- ^ Whether carriage return (CR) counts as a newline
+ -> Bool -- ^ Whether linefeed (LF) counts as a newline
+ -> Bool -- ^ Whether the sequence CR LF counts as a newline
+ -> Bool -- ^ Whether formfeed (FF) counts as a newline
+ -> Advance Char
+commonAdvance tab cr lf crlf ff = foldr (<|>) tabAdv nlAdv
+ where
+ tabAdv = linecharAdvance '\t' tab
+ nlAdv = [ adv | (adv, True) <- zipList ]
+ zipList = zip (map newlineAdvance ["\r\n", "\r", "\n", "\f"])
+ [crlf, cr, lf, ff]
+
+-- | Concatenate two advances into a single advance accepting their patterns
+-- in order, and applying the advances on top of each other. For example,
+-- concatenating an advance for @'a'@ and an advance for @'b'@ results with an
+-- advance accepting @"ab"@ and moving the position 2 columns forward.
+(<++>) :: Advance s -> Advance s -> Advance s
+a <++> b = flip (.) <$> a <*> b
+infixl 4 <++>
+
+-------------------------------------------------------------------------------
+-- Applying Advances
+-------------------------------------------------------------------------------
+
+-- | Given a list of remaining characters to read, the next position in the
+-- file and a set of advance rules, try to consume characters once and
+-- determine what is the next position after reading them. Example:
+--
+-- >>> tryAdvance defaultAdvance (Position 1 1 1) "abc"
+-- (Position 1 2 2,"bc")
+--
+-- If there is no match, it returns the input position and the input list, i.e.
+-- no characters will be consumed.
+tryAdvance :: Advance s -> Position -> [s] -> (Position, [s])
+tryAdvance a p l =
+ case findFirstPrefix a l of
+ Nothing -> (p, l)
+ Just (adv, rest) -> (adv p, rest)
+
+-- | Like 'tryAdvance', but reads one character at most. In the general case
+-- you'll want to use 'tryAdvance', because 'tryAdvanceC' breaks chains. For
+-- example, while 'tryAdvance' can recognize @"\r\n"@ as a single newline,
+-- 'tryAdvanceC' will consume only the @'\r'@, splitting the string into 2
+-- newlines.
+--
+-- If there is no match, the input position is returned.
+tryAdvanceC :: Advance s -> Position -> s -> Position
+tryAdvanceC a p s = fst $ tryAdvance a p [s]
+
+-- | Given a list of remaining characters to read, the next position in the
+-- file and a set of advance rules, consume characters once and determine what
+-- is the next position after reading them.
+--
+-- The 'defaultAdvance' is appended (using '<|>') to the given advance.
+-- Therefore, if the given list isn't empty, at leat character will be
+-- consumed. The intended use is to encode all the special cases (tab,
+-- newlines, non-spacing marks, etc.) in the given advance, and let the
+-- 'defaultAdvance' catch the rest.
+advance :: Advance s -> Position -> [s] -> (Position, [s])
+advance a = tryAdvance (a <|> defaultAdvance)
+
+-- | Like 'advance', but reads exactly one character. Patterns which require
+-- more than one character fail to match. Like 'tryAdvanceC', but has the
+-- 'defaultAdvance' appended, which means is always consumes given a non-empty
+-- list.
+advanceC :: Advance s -> Position -> s -> Position
+advanceC a p s = fst $ advance a p [s]
+
+-------------------------------------------------------------------------------
+-- Utilities based on Advances
+-------------------------------------------------------------------------------
+
+-- | Given the next position and a list matched there, annotate the symbols
+-- with position information. For a single character, it is simply the given
+-- position. For a sequence, this annotation assigns all the symbols the same
+-- line and column, incrementing only the character index.
+--
+-- >>> defaultAnnotate (Position 1 1 1) "a"
+-- [Positioned 'a' (Position 1 1 1)]
+--
+-- >>> defaultAnnotate (Position 1 1 1) "\r\n"
+-- [Positioned '\r' (Position 1 1 1), Positioned '\n' (Position 1 1 2)]
+--
+-- The last example would give the same positions to any list of the same
+-- length, e.g. @"ab"@ instead of @"\r\n"@.
+defaultAnnotate :: Position -> [s] -> [Positioned s]
+defaultAnnotate p [] = []
+defaultAnnotate p (c:cs) = Positioned c p : defaultAnnotate (f p) cs
+ where
+ f (Position l c ch) = Position l c (ch + 1)
+
+-- | Given an advance rule, the next available position and a symbol list,
+-- consume symbols once. Return a list of them, annotated with position
+-- information, as well as the next position and the rest of the input.
+-- On empty input, return @[]@, the given position and the input list.
+--
+-- If more than one character is matched, the sequence is annotated with
+-- consecutive character indices, but with the same line and column.
+--
+-- >>> enrichOnce (newlineAdvance "\r\n") (Position 1 1 1) "\r\nhello"
+-- ( [ Positioned '\r' (Position 1 1 1)
+-- , Positioned '\n' (Position 1 1 2)
+-- ]
+-- , Position 2 1 3
+-- , "hello"
+-- )
+enrichOnce :: Advance s -> Position -> [s] -> ([Positioned s], Position, [s])
+enrichOnce = enrichOnceD defaultAnnotate defaultAdvance
+
+-- | Given an advance rule, the next available position and a symbol list, try
+-- to consume symbols once. If consumed, return a list of them, annotated with
+-- position information, as well as the next position and the rest of the
+-- input. Otherwise, return @[]@, the given position and the input list.
+--
+-- If more than one character is matched, the sequence is annotated using the
+-- function passed as the first parameter.
+--
+-- >>> let ann = defaultAnnotate; adv = empty
+-- >>> enrichOnceD ann adv (newlineAdvance "\r\n") (Position 1 1 1) "\r\nhello"
+-- ( [ Positioned '\r' (Position 1 1 1)
+-- , Positioned '\n' (Position 1 1 2)
+-- ]
+-- , Position 2 1 3
+-- , "hello"
+-- )
+enrichOnceD :: (Position -> [s] -> [Positioned s]) -- ^ annotation function
+ -> Advance s -- ^ default advance
+ -> Advance s -- ^ advance rule
+ -> Position -- ^ initial position
+ -> [s] -- ^ input list
+ -> ([Positioned s], Position, [s])
+enrichOnceD ann def adv pos syms = f $ findFirstPrefix re syms
+ where
+ re = g <$> withMatched (adv <|> def)
+ g (apply, l) = (apply pos, ann pos l)
+ f Nothing = ([], firstPosition, syms)
+ f (Just ((p, s), rest)) = (s, p, rest)
+
+-- | Given a list of symbols, annotate it with position based on advance rules.
+-- Each symbol is annotated with its position in the text. In addition to the
+-- annotated list, the next available position is returned (i.e. the position
+-- of the next symbol, if another symbol were appended to the list).
+--
+-- >>> enrich defaultAdvance "abc"
+-- ( [ Positioned 'a' (Position 1 1 1))
+-- , Positioned 'b' (Position 1 2 2))
+-- ]
+-- , Position 1 3 3
+-- )
+--
+-- It is implemented using the 'defaultAdvance' as a default, i.e. the entire
+-- list is always consumed.
+enrich :: Advance s -> [s] -> ([Positioned s], Position)
+enrich adv = f . enrichD defaultAnnotate defaultAdvance adv
+ where
+ f (ps, p, _) = (ps, p)
+
+-- | Like 'enrich', but takes an annotation function as the first parameter,
+-- and a default advance as the second parameter. The rest of the parameters
+-- are the same ones 'enrich' takes. It allows using custom defaults. To have
+-- no default advance, pass 'empty'.
+--
+-- Since a match of the whole list isn't guaranteed, there is an additional
+-- list in the return type, containing the rest of the input. If the entire
+-- input is matched, that list will be @[]@. If no input is matched at all,
+-- the annotated list is @[]@, the position is 'firstPosition' and the
+-- additional list (rest of input) is the input list.
+enrichD :: (Position -> [s] -> [Positioned s])
+ -> Advance s
+ -> Advance s
+ -> [s]
+ -> ([Positioned s], Position, [s])
+enrichD ann def adv syms = f ([], firstPosition, syms)
+ where
+ g = enrichOnceD ann def adv
+ f acc@(ps, p, s) =
+ let (ps', p', s') = g p s
+ in if null ps' then acc else f (ps ++ ps', p', s')
+
+-- | Given a regex, create an equivalent position-aware regex. The resulting
+-- regex reads position-tagged symbols, and returns a position-tagged result.
+bless :: RE s a -> PosRE s a
+bless re = g <$> withMatched (comap f re)
+ where
+ f (Positioned c _) = c
+ g (val, []) = Positioned val zeroPosition
+ g (val, Positioned _ p : _) = Positioned val p
+
+-- | Tokenize an input list and get list of tokens. If there was an error (no
+-- regex match), get the text position at which it happened.
+tokens :: Advance s -- ^ Advance rule for position tagging, e.g. made with
+ -- 'commonAdvance'
+ -> RE s a -- ^ Regex which selects and returns a single token
+ -> [s] -- ^ Input list of symbols
+ -> ( [Positioned a]
+ , Maybe (Positioned s)
+ ) -- ^ List of tokens matched. If the entire input was
+ -- matched, the second element is 'Nothing'. Otherwise,
+ -- it is the (position-tagged) symbol at which matching
+ -- failed.
+tokens adv re syms =
+ let re' = many $ bless re
+ syms' = fst $ enrich adv syms
+ in case findFirstPrefix re' syms' of
+ Nothing -> ([], Just $ head syms')
+ Just (list, []) -> (list, Nothing)
+ Just (list, x:_) -> (list, Just x)
+
+-- | Get some numbers describing the given text (list of symbols):
+--
+-- * The total number of lines
+-- * The length (number of columns) of the last line
+-- * The total number of characters
+--
+-- Note that this probably isn't the fastest implementation. It's possible to
+-- compute directly by counting the lines and the characters. This function is
+-- here anyway, as a demonstration of using this library.
+--
+-- >>> let adv = commonAdvance 4 True True True True
+-- >>> textInfo adv "Hello world!\nHow are you?\nWonderful!"
+-- (3,11,36)
+textInfo :: Advance s -> [s] -> (Int, Int, Int)
+textInfo adv syms = g $ f <$> many (adv <|> defaultAdvance)
+ where
+ f flist = h $ foldl (flip (.)) id flist $ firstPosition
+ g re = fromJust $ match re syms
+ h (Position l c ch) = (l, c - 1, ch - 1)
diff --git a/src/Data/Position/Types.hs b/src/Data/Position/Types.hs
new file mode 100644
index 0000000..541b81a
--- /dev/null
+++ b/src/Data/Position/Types.hs
@@ -0,0 +1,55 @@
+{- This file is part of text-position.
+ -
+ - Written in 2015 by fr33domlover <fr33domlover@riseup.net>.
+ -
+ - ♡ Copying is an act of love. Please copy, reuse and share.
+ -
+ - To the extent possible under law, the author(s) have dedicated all copyright
+ - and related and neighboring rights to this software to the public domain
+ - worldwide. This software is distributed without any warranty.
+ -
+ - You should have received a copy of the CC0 Public Domain Dedication along
+ - with this software. If not, see
+ - <http://creativecommons.org/publicdomain/zero/1.0/>.
+ -}
+
+module Data.Position.Types
+ ( Position (..)
+ , Advance
+ , Positioned (..)
+ , PosRE
+ )
+where
+
+import Text.Regex.Applicative
+
+-- | Represents a position in a text. The intended usage is holding the next
+-- available position in a file. In other words: If a character would be
+-- appended to the file, what its position would be.
+data Position = Position
+ { line :: Int -- ^ Line number, start counting from 1
+ , column :: Int -- ^ Column number, start counting from 1
+ , char :: Int -- ^ Character index (count of characters in the file so
+ -- far), start counting from 1
+ }
+ deriving (Show, Eq)
+
+-- | Represents an advancement of the /next available position/ marker due to
+-- reading a character. For example, the letter A moves forward by one column,
+-- while linefeed (@'\n'@) moves to the beginning of the next line.
+--
+-- The character type is a type parameter.
+--
+-- An advance includes a pattern and a change. The pattern determines to which
+-- characters, or character sequences, this advance applies. The change
+-- determines how to advance the position in the pattern is matched. It can
+-- also choose different advances depending on the match, e.g. "move 1 column
+-- if matched "a" and move 4 columns if matched "\t".
+type Advance s = RE s (Position -> Position)
+
+-- | A value with a position attached.
+data Positioned a = Positioned a Position deriving (Show, Eq)
+
+-- | Applicative regex ("Text.Regex.Applicative") which takes position-tagged
+-- symbols and returns a position-tagged result.
+type PosRE s a = RE (Positioned s) (Positioned a)
diff --git a/test/test.hs b/test/test.hs
new file mode 100644
index 0000000..69fd997
--- /dev/null
+++ b/test/test.hs
@@ -0,0 +1,153 @@
+{- This file is part of text-position.
+ -
+ - Written in 2015 by fr33domlover <fr33domlover@riseup.net>.
+ -
+ - ♡ Copying is an act of love. Please copy, reuse and share.
+ -
+ - To the extent possible under law, the author(s) have dedicated all copyright
+ - and related and neighboring rights to this software to the public domain
+ - worldwide. This software is distributed without any warranty.
+ -
+ - You should have received a copy of the CC0 Public Domain Dedication along
+ - with this software. If not, see
+ - <http://creativecommons.org/publicdomain/zero/1.0/>.
+ -}
+
+import Control.Monad (unless)
+import Data.Position
+import System.Exit
+import Test.QuickCheck
+import Test.QuickCheck.Test
+import Text.Printf
+import Text.Regex.Applicative
+
+main :: IO ()
+main = do
+ results <- mapM (\ (name ,action) -> printf "%-25s: " name >> action) tests
+ unless (all isSuccess results) exitFailure
+
+prop_defaultAdvance1 :: Char -> String -> Int -> Int -> Int -> Bool
+prop_defaultAdvance1 s ss l c ch =
+ case findFirstPrefix defaultAdvance (s:ss) of
+ Nothing -> False
+ Just (adv, rest) -> rest == ss &&
+ adv (Position l c ch) == Position l (c+1) (ch+1)
+
+prop_defaultAdvance2 :: Bool
+prop_defaultAdvance2 =
+ case findFirstPrefix defaultAdvance [] of
+ Nothing -> True
+ Just _ -> False
+
+prop_linecharAdvance :: Char -> Int -> String -> Int -> Int -> Int -> Bool
+prop_linecharAdvance t w s l c ch =
+ case findFirstPrefix (linecharAdvance t w) (t:s) of
+ Nothing -> False
+ Just (adv, rest) -> rest == s &&
+ adv (Position l c ch) == Position l (c+w) (ch+1)
+
+prop_concat :: Char -> String -> Char -> String -> String
+ -> Int -> Int -> Int
+ -> Bool
+prop_concat s1 ss1 s2 ss2 ss l c ch =
+ case findFirstPrefix (adv1 <++> adv2) (s1:ss1 ++ s2:ss2 ++ ss) of
+ Nothing -> False
+ Just (adv, rest) -> rest == ss &&
+ adv (Position l c ch) == Position (l*8-1) (c*5+4) (ch*7-2)
+ where
+ a1 (Position l' c' ch') = Position (l'*8) (c'*5) (ch'*7)
+ a2 (Position l' c' ch') = Position (l'-1) (c'+4) (ch'-2)
+ adv1 = stringAdvance (s1:ss1) a1
+ adv2 = stringAdvance (s2:ss2) a2
+
+prop_enrichOnce :: Bool
+prop_enrichOnce = a == b
+ where
+ a = enrichOnce (newlineAdvance "\r\n") (Position 1 1 1) "\r\nhello"
+ b =
+ ( [ Positioned '\r' (Position 1 1 1)
+ , Positioned '\n' (Position 1 1 2)
+ ]
+ , Position 2 1 3
+ , "hello"
+ )
+
+prop_enrich :: Bool
+prop_enrich = a == b
+ where
+ a = enrich (commonAdvance 8 True True True True) "1\t4\r\n\r2\f8\t\n"
+ b =
+ ( [ Positioned '1' (Position 1 1 1)
+ , Positioned '\t' (Position 1 2 2)
+ , Positioned '4' (Position 1 10 3)
+ , Positioned '\r' (Position 1 11 4)
+ , Positioned '\n' (Position 1 11 5)
+ , Positioned '\r' (Position 2 1 6)
+ , Positioned '2' (Position 3 1 7)
+ , Positioned '\f' (Position 3 2 8)
+ , Positioned '8' (Position 4 1 9)
+ , Positioned '\t' (Position 4 2 10)
+ , Positioned '\n' (Position 4 10 11)
+ ]
+ , Position 5 1 12
+ )
+
+prop_bless :: Bool
+prop_bless = a == b
+ where
+ a = match re $ fst $ enrich (commonAdvance 8 True True True True) s
+ s = "helloB\tE\r\nB\tEhello\r\n\r\nB\tE"
+ re = some $ bless $ 'H' <$ string "hello"
+ <|> 'T' <$ string "B\tE"
+ <|> 'N' <$ string "\r\n"
+ b = Just
+ [ Positioned 'H' (Position 1 1 1)
+ , Positioned 'T' (Position 1 6 6)
+ , Positioned 'N' (Position 1 16 9)
+ , Positioned 'T' (Position 2 1 11)
+ , Positioned 'H' (Position 2 11 14)
+ , Positioned 'N' (Position 2 16 19)
+ , Positioned 'N' (Position 3 1 21)
+ , Positioned 'T' (Position 4 1 23)
+ ]
+
+comadv = commonAdvance 8 True True True True
+tokenize = tokens comadv re
+re = 'H' <$ string "hello"
+ <|> 'T' <$ string "B\tE"
+ <|> 'N' <$ string "\r\n"
+input = "helloB\tE\r\nB\tEhello\r\n\r\nB\tE"
+positions =
+ [ Positioned 'H' (Position 1 1 1)
+ , Positioned 'T' (Position 1 6 6)
+ , Positioned 'N' (Position 1 16 9)
+ , Positioned 'T' (Position 2 1 11)
+ , Positioned 'H' (Position 2 11 14)
+ , Positioned 'N' (Position 2 16 19)
+ , Positioned 'N' (Position 3 1 21)
+ , Positioned 'T' (Position 4 1 23)
+ ]
+
+prop_tokens1 :: Bool
+prop_tokens1 = tokenize input == (positions, Nothing)
+
+prop_tokens2 :: Bool
+prop_tokens2 = tokenize (input ++ "world") ==
+ (positions, Just $ Positioned 'w' (Position 4 11 26))
+
+prop_textInfo :: Bool
+prop_textInfo = textInfo comadv input == (4, 10, 25)
+
+tests :: [(String, IO Result)]
+tests =
+ [ ("defaultAdvance 1", quickCheckResult prop_defaultAdvance1)
+ , ("defaultAdvance 2", quickCheckResult prop_defaultAdvance2)
+ , ("linecharAdvance", quickCheckResult prop_linecharAdvance)
+ , ("<++>", quickCheckResult prop_concat)
+ , ("enrichOnce", quickCheckResult prop_enrichOnce)
+ , ("enrich", quickCheckResult prop_enrich)
+ , ("bless", quickCheckResult prop_bless)
+ , ("tokens1", quickCheckResult prop_tokens1)
+ , ("tokens2", quickCheckResult prop_tokens2)
+ , ("textInfo", quickCheckResult prop_textInfo)
+ ]
diff --git a/text-position.cabal b/text-position.cabal
new file mode 100644
index 0000000..7f0a52d
--- /dev/null
+++ b/text-position.cabal
@@ -0,0 +1,43 @@
+name: text-position
+version: 0.1.0.0
+synopsis: Handling positions in text and position-tagging it.
+description: This package provides tools for tagging text with
+ positions (line, column, character), and getting position
+ aware tokens from lexical analysis (see regex-applicative
+ package). It is based on Advances, a thin abstraction over
+ regular expressions.
+homepage: http://rel4tion.org/projects/text-position/
+bug-reports: http://rel4tion.org/projects/text-position/tickets/
+license: PublicDomain
+license-file: COPYING
+author: fr33domlover
+maintainer: fr33domlover@riseup.net
+copyright: ♡ Copying is an act of love. Please copy, reuse and share.
+category: Data, Text
+build-type: Simple
+extra-source-files: AUTHORS ChangeLog COPYING INSTALL NEWS README
+cabal-version: >=1.10
+
+source-repository head
+ type: darcs
+ location: http://darcs.rel4tion.org/repos/text-position/
+
+library
+ exposed-modules: Data.Position
+ other-modules: Data.Position.Interface
+ , Data.Position.Types
+ -- other-extensions:
+ build-depends: base ==4.7.*
+ , regex-applicative ==0.3.*
+ hs-source-dirs: src
+ default-language: Haskell2010
+
+test-suite test
+ default-language: Haskell2010
+ type: exitcode-stdio-1.0
+ hs-source-dirs: test
+ main-is: test.hs
+ build-depends: base ==4.7.*
+ , QuickCheck >=2.8
+ , regex-applicative ==0.3.*
+ , text-position