PyCOMPSs Wordcount test, dividing input file in blocks, only Python dictionaries used as task parameters (run at MareNostrum IV)
COMPSs 3.3

Workflow Type: COMPSs
Stable

Name: Word Count
Contact Person: support-compss@bsc.es
Access Level: public
License Agreement: Apache2
Platform: COMPSs

Description

Wordcount is an application that counts the number of words for a given set of files.

To allow parallelism the file is divided in blocks that are treated separately and merged afterwards.

Results are printed to a Pickle binary file, so they can be checked using: python -mpickle result.txt

This example also shows how to manually add input or output datasets to the workflow provenance recording (using the 'input' and 'output' terms in the ro-crate-info.yaml file).

Execution instructions

Usage:

runcompss --lang=python $(pwd)/application_sources/src/wordcount_blocks.py filePath resultPath blockSize

where:

  • filePath: Absolute path of the file to parse
  • resultPath: Absolute path to the result file
  • blockSize: Size of each block. The lower the number, the more tasks will be generated in the workflow

Execution Examples

runcompss --lang=python $(pwd)/application_sources/src/wordcount_blocks.py $(pwd)/dataset/data/compss.txt result.txt 300
runcompss $(pwd)/application_sources/src/wordcount_blocks.py $(pwd)/dataset/data/compss.txt result.txt 300
python -m pycompss $(pwd)/application_sources/src/wordcount.py $(pwd)/dataset/data/compss.txt result.txt 300

Build

No build is required

Click and drag the diagram to pan, double click or use the controls to zoom.

Version History

COMPSs 3.3 (earliest) Created 15th Dec 2023 at 14:57 by Raül Sirvent

Run using COMPSs 3.3 version at Marenostrum IV supercomputing, using 1 node (48 cores).


Frozen COMPSs-3.3 4d9de37
help Creators and Submitter
Creator
Additional credit

The Workflows and Distributed Computing Team (https://www.bsc.es/discover-bsc/organisation/scientific-structure/workflows-and-distributed-computing/)

Submitter
Citation
Conejero, J. (2023). PyCOMPSs Wordcount test, dividing input file in blocks, only Python dictionaries used as task parameters (run at MareNostrum IV). WorkflowHub. https://doi.org/10.48546/WORKFLOWHUB.WORKFLOW.687.1
Activity

Views: 3108   Downloads: 403

Created: 15th Dec 2023 at 14:57

help Attributions

None

Total size: 126 KB
Powered by
(v.1.16.0-main)
Copyright © 2008 - 2024 The University of Manchester and HITS gGmbH