Snippet to load a list of urls, scan them for link tags, and output them as a file - with Jsoup.
public void run(final String in, final String out, final String match) throws IOException {
final StringJoiner joiner = new StringJoiner("\n");
try (BufferedReader br = new BufferedReader(new FileReader(in))) {
for (String line; (line = br.readLine()) != null; ) {
try {
System.out.println("Scanning: "+line);
for (final Element link : Jsoup.connect(line)
.timeout(0)
.get()
.select(match)) {
joiner.add(link.attr("href"));
}
} catch (Exception e) {
System.err.println("Error: " + e.getMessage());
}
}
}
Files.write(Paths.get(out), joiner.toString().getBytes());
}
Where the 'match' arg will be a Jsoup matching pattern such as: a[href^=http://
No comments:
Post a Comment