Snippet to load a list of urls, scan them for link tags, and output them as a file - with Jsoup.
public void run(final String in, final String out, final String match) throws IOException { final StringJoiner joiner = new StringJoiner("\n"); try (BufferedReader br = new BufferedReader(new FileReader(in))) { for (String line; (line = br.readLine()) != null; ) { try { System.out.println("Scanning: "+line); for (final Element link : Jsoup.connect(line) .timeout(0) .get() .select(match)) { joiner.add(link.attr("href")); } } catch (Exception e) { System.err.println("Error: " + e.getMessage()); } } } Files.write(Paths.get(out), joiner.toString().getBytes()); }
Where the 'match' arg will be a Jsoup matching pattern such as: a[href^=http://
No comments:
Post a Comment