Create a structured streaming spark job that streams from a kafka topic and then calls a python flask app and stores the returned data back in a new kafka topic

This project includes writing a simple SPARK APPLICATION in SCALA. This will be launched with spark-submit option.

I prefer if we use structured streaming in KAFKA: [login to view URL]

Please make it a maven project with [login to view URL] so it can be built easily!


a) A kafka topic exists with the name "topicA"


b) When you subscribe to that topic you get the following output for example in JSON format:

{"destination":"213","source":"111","productId":"10","submittedTs":"20003","userAgent":"v7","_firstEts":"1922","content":"This is an example c0ntent","accountId":"111-222","contentEncoding":"NORMAL","channel":"xxc"}

{"destination":"213","source":"234","productId":"10","submittedTs":"2409`","userAgent":"v7","_firstEts":"2019","content":"Another example content","accountId":"111-222","contentEncoding":"NORMAL","channel":"xxc"}

{"destination":"124","source":"55565","productId":"12","submittedTs":"44323","userAgent":"v7","_firstEts":"2022","content":"Blah blah food","accountId":"111-222","contentEncoding":"NORMAL","channel":"xxc"}

{"destination":"2232","source":"511","productId":"11","submittedTs":"54124","userAgent":"v7","_firstEts":"2302","content":"the quick brown fox jumped!","accountId":"111-222","contentEncoding":"NORMAL","channel":"xxc"}


c) A spark job will be able to stream data from a kafka topic and then extract certain fields from topicA namely:

This should be written in a way where it is easy to add/remove more fields that want to be extracted from it.

-) accountId

-) submittedTs

-) source

-) destination

-) content


d) and this will then be sent to a python flask server which will return back the following to the spark application:

Note: 1 new field comes back from flask server called "feature"

This should also be written in a way where it is easy to add/remove fields that come back from the flask python server.

-) accountId

-) content

-) feature

-) submittedTs

{"accountId":"111-222","content":"This is an example c0ntent","feature":"hello","submittedTs":"20003"}

{"accountId":"111-222","content":"Another example content":"hello","submittedTs":"2409"}

{"accountId":"111-222","content":"Blah blah food","feature":"hello","submittedTs":"44323"}

{"accountId":"111-222","content":"the quick brown fox jumped!","feature":"hello","submittedTs":"54124"}


e) the spark application will then dump all this in a new kafka topic called topicB in a JSON format, exactly like the format is stored in topicA.


f) spark application will be submitted with the following application arguments









So our spark-submit command will look like:

spark-submit --master <$master> --num-executors 2 --total-executor-cores 1 --executor-cores 1 --executor-memory 1g --driver-memory 1g --name my-spark-job --class [login to view URL] --queue default --deploy-mode cluster /home/ubuntu/[login to view URL] <$kafka-bootstrap-server> <$input-kafka-topic> <$flask-server-port> <$kafka-group> <$output-kafka-topic> <$flask-server-endpoint>

and with argument values filled in:

spark-submit --master yarn --num-executors 2 --total-executor-cores 1 --executor-cores 1 --executor-memory 1g --driver-memory 1g --name my-spark-job --class [login to view URL] --queue default --deploy-mode cluster /home/ubuntu/[login to view URL] [login to view URL] topicA 5001 kafka_group_1 topicB http://localhost/default/post/

I can give you the code for flask python app so you can help test it. It is attached with this ([login to view URL]) ticket. Run it with command: python [login to view URL]

- You can CURL it to test locally:

curl --header "Content-Type: application/json" --request POST --data '{"accountId":"111-222","submittedTs":"2000","source":"ABC","destination":"XYZ","content":"blablabla"}' localhost:5001/default/post/

Fähigkeiten: Skala, Spark, Apache Maven, Flask

Mehr darüber create webform calls python script, spark structured streaming kafka avro, spark structured streaming kafka json, spark structured streaming kafka example java, spark structured streaming example, spark structured streaming kafka sink, spark structured streaming kafka example, spark structured streaming json, dockerize python flask app, spark structured streaming checkpoint, structured streaming vs spark streaming, spark structured streaming output mode, spark structured streaming join, spark structured streaming kafka avro scala, spark structured streaming kafka python, spark structured streaming kafka parquet, spark structured streaming kafka multiple topics, spark structured streaming kafka json java, deploy python flask app on azure, deploy python flask app on tomcat

Über den Arbeitgeber:
( 1 Bewertung ) Hammersmith, United Kingdom

Projekt-ID: #25216609

Vergeben an:


working as an architect, good experience of spark , read you requirements, let's discuss more. would be happy to help

£128 GBP in 3 Tagen
(0 Bewertungen)