Opened 10 years ago

Last modified 10 years ago

#7580 new Bugs

boost::asio::tcp SEGV after ioservice stopped, restarted and new

Reported by: tobi@… Owned by: chris_kohlhoff
Milestone: To Be Determined Component: asio
Version: Boost 1.49.0 Severity: Problem
Keywords: Cc:

Description

For solarpowerlog [1] I use boost::asio for the communication via TCP/IP. During an implementation of an new feature I experienced an reproducible segfault when during ioservice->poll()

I hope to give enough information to dig into, but if you need some additional information, please let me know.

Please let me also know if I am (unknowingly) misuse or use the library wrongly.

Thanks coldtobi

What I am doing:

The feature is for sharing a TCP/IP connection within several data sources/sinks in the programm. For this implementation every data source can write and receive at abitrary times where sends have priority over receives. This is realized in the way that receives are interrupted if another data source request a write and then the receive is resumed. The handling of the TCP/IP i/o is done in an dedicated thread (I mark this "[T]" below) , the interruption in the "main" process (marked [M])

So basically the programm does: [T] socket->connect() [T] socket->async_write(); (in parallel running a deadline_timer to catch timeouts) [T] socket->read_some(); (in parallel running a deadline_timer to catch timeouts. Later we will socket->read_some again after we know the amount of pending bytes. and read those. But for this report: The interruption (next step) is usually at the first read_some() ) [M] ioservice->stop(); (this aborts the read_some, )

Here the thread making the IO detects that the ioservice has been stopped via ioservice->stopped() and will then abort the current operation, wait for the next request adn before executing this new request it will call ioservice->reset():

[t] ioservice->reset()

As we now close to the crash, I will elabaroate as precise as possible what is done. (This all now is in the I/O thread)

Note: You can find the code at [3]. I add line numbers for better orientation

creating the deadline-timer: (lines 684+)

	deadline_timer timer(*(this->ioservice));
	boost::posix_time::time_duration td = boost::posix_time::millisec(timeout);
	timer.expires_from_now(td);
	timer.async_wait(boost::bind(&boosthelper_set_result, (int*) &result_timer,
			1));

(boosthelper_set_result is defined in line 451 and just set result_timer to 1 if called)

write the bytes: (line 692+)

        
	boost::asio::async_write(*sockt, boost::asio::buffer(s), write_handler);

s is a std::string and contains the data to be sent. max 255 bytes ASCII)

run the ioservice once to either finish the write or the timeout (line 695)

	size_t num = ioservice->run_one(ec);

run_once returns and the evaluation shows that async_write completed -- num!=0 is checked and the result_timer also

Observation: Wireshark shows that the bytes are not transmitted Then the timer is cancelled and its completion handler catched: (line 703+)

 timer.cancel();
 LOGTRACE(logger, __PRETTY_FUNCTION__ << ": still alive 2f");
 ioservice->poll();
 LOGTRACE(logger, __PRETTY_FUNCTION__ << ": still alive 3");

ioservice->poll() is never returning, SEGV is raised here. Unfortunatly the backtrace is also corrupted.

When not calling the ioservice->stop(), the SEGFAULT is gone too. (Confirmed by working around the need to stop the ioservice by just polling with short timeouts)

Testcase Providing a destilled testcase is unfortuantly difficult. However, you can use solarpowerlog to see the issue. I commmited also sample configuration which allows a easy reproduction of the issue. See the git tree here [2].

If you have debian, you can install the build-dependencies by apt-get build-dep solarpowerlog Otherwise configure will tell you about missing deps :-)

git clone  git://solarpowerlog.git.sourceforge.net/gitroot/solarpowerlog/solarpowerlog
cd solarpowerlog 
git checkout 47c079d3409c867287888f47bedb4f05b1c353b5
./bootstrap.sh
./configure --enable-sharedcomms --enable-sputniksimulator --enable-dummyinverter
make 
# in one shell (this one will not crash)
src/solarpowerlog -c tools/sputnik_simulator/solarpowerlog_shared.conf
# in another shell: (this one which will crash)
src/solarpowerlog -c example_confs/solarpowerlog_shared_sim.conf

After some seconds youl'll see the SEGV:

12839 [0x7f2cc3345760] DEBUG inverter.Simulator2.Comms_SharedConnection.SharedTarget null - virtual void CSharedConnectionSlave::Send(ICommand*): work: 0x7f2cb8003550
12839 [0x7f2cc3345760] DEBUG inverter.Simulator2.Comms_SharedConnection.SharedTarget null -  Not atomic 0x7f2cb8003550
12839 [0x7f2cc3345760] DEBUG inverter.Simulator2.Comms_SharedConnection.SharedTarget null - virtual void CSharedConnectionSlave::Send(ICommand*): submitting work: 0x7f2cb8003550
12839 [0x7f2cc3345760] DEBUG inverter.Simulator.Comms_SharedConnection..Comms_TCP_ASIO null - virtual bool CConnectTCPAsio::AbortAll() Aborting 0 backlog entries
12839 [0x7f2cc3345760] DEBUG inverter.Simulator.Comms_SharedConnection..Comms_TCP_ASIO null - virtual bool CConnectTCPAsio::AbortAll() Done
12839 [0x7f2cc3345760] DEBUG inverter.Simulator.Comms_SharedConnection. null - Ticket for this command is: 0 (current ticket is 0)
12839 [0x7f2cc3345760] DEBUG inverter.Simulator.Comms_SharedConnection. null - CSharedConnectionMaster::Send() ICmd: 0x7f2cb8003550
12964 [0x7f2cbfea1700] TRACE inverter.Simulator.Comms_SharedConnection..Comms_TCP_ASIO null - ioservice stopped (1)
12964 [0x7f2cbfea1700] TRACE inverter.Simulator.Comms_SharedConnection..Comms_TCP_ASIO null - Waiting for work
12964 [0x7f2cbfea1700] DEBUG inverter.Simulator.Comms_SharedConnection..Comms_TCP_ASIO null - ioservice stopped
12964 [0x7f2cbfea1700] TRACE inverter.Simulator.Comms_SharedConnection..Comms_TCP_ASIO null - void CConnectTCPAsio::HandleSend(CAsyncCommand*): now handling: 0x7f2cb8003550
12964 [0x7f2cbfea1700] TRACE inverter.Simulator.Comms_SharedConnection..Comms_TCP_ASIO null - void CConnectTCPAsio::HandleSend(CAsyncCommand*): still alive 1
12964 [0x7f2cc3345760] DEBUG inverter.Simulator.Comms_SharedConnection. null - virtual void CSharedConnectionMaster::ExecuteCommand(const ICommand*) now handling: 0x7f2cb8004460
12964 [0x7f2cbfea1700] TRACE inverter.Simulator.Comms_SharedConnection..Comms_TCP_ASIO null - void CConnectTCPAsio::HandleSend(CAsyncCommand*): still alive 2
12964 [0x7f2cc3345760] DEBUG inverter.Simulator.Comms_SharedConnection. null - virtual void CSharedConnectionMaster::ExecuteCommand(const ICommand*) rescheduling read: 0x7f2cb8003e30
12964 [0x7f2cbfea1700] TRACE inverter.Simulator.Comms_SharedConnection..Comms_TCP_ASIO null - void CConnectTCPAsio::HandleSend(CAsyncCommand*): still alive 2e
12964 [0x7f2cbfea1700] TRACE inverter.Simulator.Comms_SharedConnection..Comms_TCP_ASIO null - void CConnectTCPAsio::HandleSend(CAsyncCommand*): still alive 2f
src/solarpowerlog Segmentation fault.

[1] http://sourceforge.net/projects/solarpowerlog/

[2] http://solarpowerlog.git.sourceforge.net/git/gitweb.cgi?p=solarpowerlog/solarpowerlog;a=commit;h=47c079d3409c867287888f47bedb4f05b1c353b5

[3] http://solarpowerlog.git.sourceforge.net/git/gitweb.cgi?p=solarpowerlog/solarpowerlog;a=blob;f=src/Connections/CConnectTCPAsio.cpp;h=5e3b0a1b13b66cc07e491ed98ad7dfe1f3cc5277;hb=47c079d3409c867287888f47bedb4f05b1c353b5

Change History (2)

comment:1 by anonymous, 10 years ago

Component: Noneasio
Owner: set to chris_kohlhoff

comment:2 by anonymous, 10 years ago

Version: Boost 1.50.0Boost 1.49.0
Note: See TracTickets for help on using tickets.