Stag Scripts

These scripts come with the stag and dbstag distributions

Data::Stag Script List


stag-autoschema.pl

writes the implicit stag-schema for a stag file
stag-autoschema.pl -w sxpr sample-data.xml
Takes a stag compatible file (xml, sxpr, itext), or a file in any
format plus a parser, and writes out the implicit underlying stag-schema

stag-schema should look relatively self-explanatory.

Here is an example stag-schema, shown in sxpr syntax:
(db
   (person*
    (name "s"
    (address+
     (address_type "s")
     (street "s")
     (street2? "s")
     (city "s")
     (zip? "s")))))

The database db contains zero or more persons, each person has a
mandatory name and at least one address.

The cardinality mnemonics are as follows:

stag-db.pl

persistent storage and retrieval for stag data (xml, sxpr, itext)
stag-db.pl -r person -k social_security_no -i ./person-idx myrecords.xml
  stag-db.pl -i ./person-idx -q 999-9999-9999 -q 888-8888-8888
Builds a simple file-based database for persistent storage and
retrieval of nodes from a stag compatible document.

Imagine you have a very large file of data, in a stag compatible
format such as XML. You want to index all the elements of type
B<person>; each person can be uniquely identified by
B<social_security_no>, which is a direct subnode of B<person>

The first thing to do is to build an index file, which will be stored
in your current directory:
stag-db.pl -r person -k social_security_no -i ./person-idx myrecords.xml

You can then use the index "person-idx" to retrieve B<person> nodes by
their social security number

  stag-db.pl -i ./person-idx -q 999-9999-9999 > some-person.xml

You can export using different stag formats

  stag-db.pl -i ./person-idx -q 999-9999-9999 -w sxpr > some-person.xml

You can retrieve multiple nodes (although these need to be rooted to
make a valid file)

  stag-db.pl -i ./person-idx -q 999-9999-9999 -q 888-8888-8888 -top personset

Or you can use a list of IDs from a file (newline delimited)

  stag-db.pl -i ./person-idx -qf my_ss_nmbrs.txt -top personset

stag-diff.pl

finds the difference between two stag files
stag-diff.pl -ignore foo-id -ignore bar-id file1.xml file2.xml
Compares two data trees and reports whether they match. If they do not
match, the mismatch is reported.

stag-drawtree.pl

draws a stag file (xml, itext, sxpr) as a PNG diagram
stag-drawtree.pl -o my.png myfile.xml

  stag-drawtree.pl -p My::MyFormatParser -o my.png myfile.myfmt
requires GD library and GD perl module

stag-eval.pl

stag-eval.pl '' file2.xml

    

stag-filter.pl

filters a stag file (xml, itext, sxpr) for nodes of interest
stag-filter.pl person -q name=fred file1.xml

  stag-filter.pl person 'sub {shift->get_name =~ /^A*/}' file1.xml

  stag-filter.pl -p My::Foo -w sxpr record 'sub{..}' file2
parsers an input file using the specified parser (which may be a built
in stag parser, such as xml) and filters the resulting stag tree
according to a user-supplied subroutine, writing out only the
nodes/elements that pass the test.

the parser is event based, so it should be able to handle large files
(although if the node you parse is large, it will take up more memory)

stag-findsubtree.pl

finds nodes in a stag file
stag-findsubtree.pl 'person/name' file.xml
parses in an input file and writes out subnodes

stag-flatten.pl

turns stag data into a flat table
stag-flatten.pl MyFile.xml dept/name dept/person/name
reads in a file in a stag format, and 'flattens' it to a tab-delimited
table format. given this data:
(company
   (dept
    (name "special-operations")
    (person
     (name "james-bond"))
    (person
     (name "fred"))))

the above command will return a two column table

  special-operations      james-bond
  special-operations      fred

stag-grep.pl

filters a stag file (xml, itext, sxpr) for nodes of interest
stag-grep.pl person -q name=fred file1.xml

  stag-grep.pl person 'sub {shift->get_name =~ /^A*/}' file1.xml

  stag-grep.pl -p My::Foo -w sxpr record 'sub{..}' file2
parsers an input file using the specified parser (which may be a built
in stag parser, such as xml) and filters the resulting stag tree
according to a user-supplied subroutine, writing out only the
nodes/elements that pass the test.

the parser is event based, so it should be able to handle large files
(although if the node you parse is large, it will take up more memory)

stag-handle.pl

streams a stag file through a handler into a writer
stag-handle.pl -w itext -c my-handler.pl myfile.xml > processed.itext
  stag-handle.pl -w itext -p My::Parser -m My::Handler myfile.xml > processed.itext
will take a Stag compatible format (xml, sxpr or itext), turn the data
into an event stream passing it through my-handler.pl

stag-join.pl

joins two stag files together based around common key
stag-join.pl  -w xml country/city_id=capital/capital_id countries.xml capitals.xml

  stag-join.pl  -w itext gene/tax_id=species/tax_id genedb.itext speciesdb.itext
Performs a relational-style INNER JOIN between two stag trees; this
effectively merges two files together, based on some kind of ID in the
file

stag-merge.pl

stag-merge.pl  -xml file1.xml file2.xml
script wrapper for the Data::Stag modules

stag-mogrify.pl

mangle stag files
stag-mogrify.pl  -w itext file1.xml file2.xml
script wrapper for the Data::Stag modules

feeds in files into a parser object that generates nestarray events,
and feeds the events into a handler/writer class

stag-parse.pl

parses a file and fires events (e.g. sxpr to xml)
# convert XML to IText
  stag-parse.pl -p xml -w itext file1.xml file2.xml

  # use a custom parser/generator and a custom writer/generator
  stag-parse.pl -p MyMod::MyParser -w MyMod::MyWriter file.txt
script wrapper for the Data::Stag modules

feeds in files into a parser object that generates nestarray events,
and feeds the events into a handler/writer class

stag-query.pl

aggregare queries
stag-query.pl avg person/age file.xml

  stag-query.pl sum person/salary file.xml

  stag-query.pl 'sub { $agg .= ", ".shift }' person/name file.xml
Performs aggregate queries

stag-splitter.pl

splits a stag file into multiple files
stag-splitter.pl -split person -name social_security_no file.xml
Splits a file using a user specified parser (default xml) around a
specified split node, naming each file according to the name argument

the files will be named anonymously, unless the '-name' switch is specified; this will use the value of the specified element as the filename

eg; if we have
<top>
    <a>
      <b>foo</b>
      <c>yah</c>
      <d>
        <e>xxx</e>
      </d>
    </a>
    <a>
      <b>bar</b>
      <d>
        <e>wibble</e>
      </d>
    </a>
  </top>

if we run

  stag-splitter.pl -split a -name b

it will generate two files, "foo.xml" and "bar.xml"

input format can be 'xml', 'sxpr' or 'itext' - if this is left blank
the format will be guessed from the file suffix

the output format defaults to the same as the input format, but
another can be chosen.

files go in the current directory, but this can be overridden with the
'-dir' switch

stag-view.pl

draws an expandable Tk tree diagram showing stag data
stag-view.pl  file1.xml
Draws a Tk tree, with expandable/convertable nodes

DBIx::DBStag Script List


selectall_html.pl

selectall_html.pl -d "dbi:Pg:dbname=mydb;host=localhost" "SELECT * FROM a NATURAL JOIN b"

    

selectall_xml.pl

selectall_xml.pl [-d <dbi>] [-f file of sql] [-nesting|n <nesting>] SQL
This script will query a database using either SQL provided by the
script user, or using an SQL templates; the query results will be
turned into XML using the L<DBIx::DBStag> module. The nesting of the
XML can be controlled by the DBStag SQL extension "USE NESTING..."

stag-autoddl.pl

stag-autoddl.pl -parser XMLAutoddl -handler ITextWriter file1.txt file2.txt

  stag-autoddl.pl -parser MyMod::MyParser -handler MyMod::MyWriter file.txt
script wrapper for the Data::Stag modules

stag-bulkload.pl

creates bulkload SQL for input data
# convert XML to IText
  stag-bulkload.pl -l person file1.xml file2.xml

  # use a custom parser/generator and a custom writer/generator
  stag-bulkload.pl -p MyMod::MyParser file.txt
Creates bulkload SQL statements for an input file

Works only with certain kinds of schemas, where the FK relations make
a tree (not a graph); i.e. the only FKs are to the parent

You do not need a connection to the DB

It is of no use for incremental loading - it assumes integer surrogate
promary keys and starts these from 1

stag-ir.pl

information retrieval using a simple relational index
stag-ir.pl -r person -k social_security_no -d Pg:mydb myrecords.xml
  stag-ir.pl -d Pg:mydb -q 999-9999-9999 -q 888-8888-8888
Indexes stag nodes (XML Elements) in a simple relational db structure
- keyed by ID with an XML Blob as a value

Imagine you have a very large file of data, in a stag compatible
format such as XML. You want to index all the elements of type
B<person>; each person can be uniquely identified by
B<social_security_no>, which is a direct subnode of B<person>

The first thing to do is to build the index file, which will be stored
in the database mydb
stag-ir.pl -r person -k social_security_no -d Pg:mydb myrecords.xml

You can then use the index "person-idx" to retrieve B<person> nodes by
their social security number

  stag-ir.pl -d Pg:mydb -q 999-9999-9999 > some-person.xml

You can export using different stag formats

  stag-ir.pl -d Pg:mydb -q 999-9999-9999 -w sxpr > some-person.xml

You can retrieve multiple nodes (although these need to be rooted to
make a valid file)

  stag-ir.pl -d Pg:mydb -q 999-9999-9999 -q 888-8888-8888 -top personset

Or you can use a list of IDs from a file (newline delimited)

  stag-ir.pl -d Pg:mydb -qf my_ss_nmbrs.txt -top personset

stag-pgslurp.pl

stag-pgslurp.pl -d "dbi:Pg:dbname=mydb;host=localhost" myfile.xml
This script is for storing data (specified in a nested file format
such as XML or S-Expressions) in a database. It assumes a database
schema corresponding to the tags in the input data already exists.

stag-sl2sql.pl


    

    

stag-storenode.pl

stag-storenode.pl -d "dbi:Pg:dbname=mydb;host=localhost" myfile.xml
This script is for storing data (specified in a nested file format
such as XML or S-Expressions) in a database. It assumes a database
schema corresponding to the tags in the input data already exists.