DGraph-GP
Download:
download DGraph-GP (Linux Version)
download DGraph-GP (Unix Version)
download DGraph-GP Interface Addition (Unix Version)
(you may have change permissions on the programs to execute them)
The DGraph-GP Interface Addition is complied java whose execution is
"java Main".
Instructions:
File Format:
Data File: (attribute definitions plus training items)
- First line must contain only an integer, representing the number of attributes each item has (including
classification attribute)
- The next n lines (where n is the number on the first line) are the attributes definitions, which must be
in the following format:
- Attribute Name followed by a ":" followed by a comma separted list of possible attribute values followed by a ".".
- The first attribute must be the classification attribute, and must be named "class"
- Continuous attributes are defined by the name followed by ": continuous."
- Each remaining line in the data file should represent one training item, where each attribute is in the
same
order
as in the attribute definition, separated by a space.
Example: 3 attributes / 4 data items
3
class: ENT, O.
size: continuous.
punctuation: comma, fullstop, other, none.
ENT 4 none
ENT 3 other
O 6 comma
O 4 none
Test File:
- Each line in the test file should represent one test item, where each attribute is in the same order
as in the attribute definition, separated by a space.
Command Line:
./dgraph-gp {-}{t}{B}{m}{M} DATA_FILE {TEST_FILE} LOOK_AHEAD PROBABILITY_OF_JOIN
{} denotes an optional paramater which may or may not be included
- - if any optional parameters are to be included (use this only once: do NOT add whitespace)
- t if a test file is to be passed to dgraph-gp (ie, if {TEST_FILE} is given)
- B use AdaBoost algorithm (test file must be supplied)
- m allow multiple joins at a given decsion node
- M make leaf nodes as pure as possible
- DATA_FILE the name of your data file (attributes plus training items). Must have an '.dta'
extension.
- TEST_FILE the name of your test file. Usually a '.tst' extension.
- PROBABILITY_OF_JOIN value between 0.0 and 0.8. Represents the prior probability (0% to 80%) that a decision node will be a join node (ie, that it will have more than one parent node).
- LOOK_AHEAD value between 0 and 4. Represents the depth to which the graph will be expanded before a certain split is decided. This is used to avoid local maximums.
Interface Addition:
- Type "java Main" to run
- Depending on how the system you are using interacts with java, you may have to generate the decision
graph using the command line argument described above, before loading the result into the interface.
- To load an existing graph:
- select "File >Load Training File" and select your data file (training file)
- select "File >Load Existing Graph" and select the graph file
- NB: the graph file output of dgraph is the name of your data file (training file) with a ".graph" extension
- Within a few seconds the graph will appear on the the screen
- select "File >Load Test File" and select a test file to classify its items
- select "Display >Statistics" to see the statistics for the training and test data
- select "Display >Node Details" to see the details for a given decision node. Use the id's from the
graph displayed. (here "path" is the list of decision node id's through which each data item has passed)