largeRCRF-Java

Author	SHA1	Message	Date
Joel Therrien	54af805d4d	Fairly significant refactoring; Made Forest into an abstract class with OnlineForest (in memory; same as previous) and OfflineForest (reads individual trees only as needed). Many methods were changed.	2019-10-02 13:54:45 -07:00
Joel Therrien	79a9522ba7	Several changes - Fixed some tests that weren't running. Fixed a bug where training crashed if FactorCovariates had any NA Fixed a bug where FactorCovariates were ignored in splitting if nsplit==0 Added a covariate specific option for whether splitting on an NA variable should have a penalty. This penalty is accomplished by first calculating the split score and best split for a covariate without NAs as done previously before. Then NAs are randomly assigned, and the split score is recalculated on that best split. The new score is the lower of the new score and the original.	2019-08-28 18:07:35 -07:00
Joel Therrien	c24626ff61	Upgrade to latest version of Jackson databind.	2019-08-27 14:53:49 -07:00
Joel Therrien	51696e2546	Fix how variable importance works to be tree based and not forest based	2019-08-12 14:28:31 -07:00
Joel Therrien	f1c5b292ed	Add serialVersionUID to Serializable classes This makes forests between versions more compatible if only method definitions changed.	2019-08-02 15:10:48 -07:00
Joel Therrien	a56ad4433d	Add variable importance methods to library	2019-07-29 12:21:35 -07:00
Joel Therrien	f23ee21ef3	Add utility function for R package	2019-07-24 14:26:08 -07:00
Joel Therrien	186de413ed	Merge branch 'master' into ibs	2019-07-22 11:38:15 -07:00
Joel Therrien	aa1f544ea2	Upgrade Jackson in executable project Jackson has some security vulnerability that can only be exploited in a server environment; not relevant here whatsoever, but might as well update.	2019-07-22 11:36:29 -07:00
Joel Therrien	86f6c195d7	Fix unused imports	2019-07-22 11:29:37 -07:00
Joel Therrien	9258f75e4e	Add integrated Brier score error measure	2019-07-22 11:23:34 -07:00
Joel Therrien	7371dab4f1	Fix two bugs on how RightContinuousStepFunction is integrated. Specifically, the integration returned an NaN if the integration was up to an NaN (real inegrals are robust); and the results were negative if integrating from a to b where a > b.	2019-07-19 14:14:36 -07:00
Joel Therrien	ae9a6b9a3f	Add binary and unary operations to RightContinuousStepFunctions	2019-07-18 15:14:01 -07:00
Joel Therrien	d7cdc9f6e7	Split into two Maven modules One for the library (same as before), the other for the executable part.	2019-07-03 09:37:19 -07:00
Joel Therrien	ee4b513298	Remove dependencies from project This project is now purely a library only; the code for running directly from the command line will be put into a new project. This was important because we were including large dependencies into the R code that weren't needed and created some minor licensing inconveniences.	2019-07-02 16:54:58 -07:00
Joel Therrien	bc2c240823	Update Jackson to 2.9.9 Apparently older versions of Jackson contain a security vulnerability (not really important for this project, given that users are only ever using Jackson on their own settings files)	2019-07-02 15:39:33 -07:00
Joel Therrien	21ce7ce135	Removed some unnecessary dependencies in a test file.	2019-06-28 15:25:46 -07:00
Joel Therrien	88dc1c2316	Add README file	2019-06-28 10:54:42 -07:00
Joel Therrien	78ca8bad73	Update pom to build package with minimal dependencies	2019-06-27 15:34:09 -07:00
Joel Therrien	22accdb263	Add support for providing an initial forest to add trees to	2019-06-07 19:55:44 -07:00
Joel Therrien	7da3bd14a5	Add test verifying that CIFs are averaged together in the same way as randomForestSRC	2019-06-05 15:13:08 -07:00
Joel Therrien	4aac73b868	Improve progress tracking to terminate when all trees are trained Instead of only checking every second	2019-05-29 15:05:48 -07:00
Joel Therrien	1e40b7ff9b	Add missing copyright notices	2019-05-10 16:02:59 -07:00
Joel Therrien	6f318db79e	Add support for seeds to control randomness when training forests	2019-05-10 16:02:33 -07:00
Joel Therrien	17ae3a9f5a	Refactor - rename GroupDifferentiators into SplitFinders SplitRule would have made more sense but it was already taken.	2019-05-08 16:09:09 -07:00
Joel Therrien	c5c74ad7e9	Fix bug where parallel forests never finish Add test to detect case	2019-04-29 11:02:14 -07:00
Joel Therrien	de3de300cf	Change how parallel trees are trained to be more robust to Threads getting terminated. This will hopefully make the package more stable on cluster systems, where sometimes the forests immediately stall.	2019-04-26 11:04:03 -07:00
Joel Therrien	1e643385ee	Adjust MultipleLogRankDifferentiators to use actual implementation found in randomForestSRC Also merge SingleLogRankDifferentiators with the Group variants, as they now reduce tot he simple case when given only one event of focus.	2019-04-23 17:34:39 -07:00
Joel Therrien	c6a5787975	Use UUIDs to save trees instead of tree number. Benefits are for when we restart a previously parallel task in which, say, trees 1, 2, and 4 were completed but tree 3 never did complete. Under the previous implementation we'd start at tree 4 (we'd just count how many trees were done). To fix this would require some additional effort. Since the order of trees is irrelevant, it made sense to just stop ordering them.	2019-04-16 12:58:23 -07:00
Joel Therrien	fb20b08a23	Add debugging information to RUtils	2019-04-16 12:55:57 -07:00
Joel Therrien	f1f507d2df	Add out-of-bag predict methods for R to use	2019-04-07 15:11:06 -07:00
Joel Therrien	50b4a3cd89	Fix and optimize how progress is displayed while training trees	2019-04-05 11:22:59 -07:00
Joel Therrien	ea176cff9a	An an evaluateSerial function to Forest parallelStream doesn't seem to work very well on the ComputeCanada.ca cluster	2019-04-05 11:13:23 -07:00
Joel Therrien	bf168bc2a5	Improve performance of CompetingRiskFunctionCombiner Estimate of time improvement is at least 10x faster	2019-03-27 21:06:15 -07:00
Joel Therrien	c8269ae285	Fix bug in test	2019-03-27 21:06:05 -07:00
Joel Therrien	526a127b9d	Change Forest to keep trees in a List instead of a Collection	2019-03-27 10:57:01 -07:00
Joel Therrien	585d6d3c5b	Make SplitRules their own class; independent of their Covariate parents. This was done so that when we serialize trees (and thus SplitRules) we don't awkwardly also serialize ntree versions of the Covariates, which is really awkward when deserializing them.	2019-03-25 14:44:31 -07:00
Joel Therrien	76b2cdd3c4	WIP - Some changes to how trees are saved.	2019-03-25 10:59:55 -07:00
Joel Therrien	76614ee68b	Better memory management to help prevent OutOfMemoryExceptions	2019-03-25 10:59:26 -07:00
Joel Therrien	d65e010c48	Very minor improvement to how tree filenames are saved.	2019-03-13 11:21:51 -07:00
Joel Therrien	02b7a5cb9a	Rebrand project as largeRCRF	2019-03-13 10:47:37 -07:00
Joel Therrien	cfa3a6f432	Attempting memory optimizations	2019-03-13 10:39:18 -07:00
Joel Therrien	8014bd4629	Fix bug where NAs cause crash	2019-03-04 11:36:21 -08:00
Joel Therrien	91cf299362	Add some R utility functions to help data get quickly loaded	2019-03-04 11:23:31 -08:00
Joel Therrien	29b154110a	Fix bug where template.yaml gets replaced whenever user wants to look at help dialog	2019-02-28 11:12:21 -08:00
Joel Therrien	a7f591c2d3	Add integration capabilities to RightContinuousStepFunction; use it for calculating mortality	2019-02-18 14:57:26 -08:00
Joel Therrien	e74ba23177	Small changes; more tests.	2019-02-18 14:57:13 -08:00
Joel Therrien	9f513ab75b	Add capabilities to get nodes of a certain type in a forest; used to produce summary statistics	2019-02-02 09:36:00 -08:00
Joel Therrien	77ec780304	Fix theoretical bug	2019-01-29 13:38:45 -08:00
Joel Therrien	d8e52ecd82	Add tests around NumericSplitRuleUpdater; fix minor bug.	2019-01-23 19:37:10 -08:00

1 2 3

117 commits